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PREFACE TO THE FIRST EDITION 

The field of statistics is many sided and ranges over different levels. 
However, between the levels of clerical work at one extreme and 
mathematical rcvsearch at the other extreme, there is a well-defined 
methodology, mathematical in nature, which underlies the specialized 
applications in the departments of economics, psychology, education, 
and biology. 

This book is an elementary text dealing with the mathematics of 
statistics. Fortunately, a considerable part of the descriptive meth¬ 
odology of statistics can be understood by those having n-iatively 
little knowledge of college mathemati(^s. Although no mathematics 
beyond the ordinary Freshman course in college algebra is required 
for a profitable reading of this text, a certain degree of mathematical 
maturity and intelligence is presupposed. To achieve the maximum 
success perhaps only the best of those students whose mathematical 
preparation is limited to the minimum prerequisite should be encour¬ 
aged to study it. Occasionally, material is introduced to sharpen 
the interest and challenge the ability of the more advanced student 
without interrupting the main developments or discouraging those 
less mature. 

In writing this book, considerable selection of material necessarily 
had to be made. The omission of certain topics will be noted in the 
table of contents. Judging from my own experien(^o, and that of 
others, the theory of sampling cannot be taught satisfactorily at the 
level for which Part I is intended. At best only a superficial use of 
formulas could be hoped for. Consequently, I have elected to defer 
this subject to Part II whore a systematic treatment can be given. 
With regard to time series analysis, Professor J. Neyman says in his 
Lectures And Conferences On Mathematical Statistics (p. 106), 

We start by trying to split each of the series into several parts, which we 
arbitrarily assume to bo additive. One of these parts is the trend, which we 
(estimate perhaps by fitting a low order parabola to the whole series available. 
The next part is the business cycle.” The third part is the “ seasonal varia¬ 
tion,” which we frequently estimate by calculating moving averages. Finally, 
the remainder is considered to arise from random causes, and we concentrate 
on the question whether such a remainder in one of the variables is correlated 
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with that in some other. All this procedure seems to me very artificial and 
arbitrary. ... In my opinion the whole problem of time series must be treated 
from a point of view that is quite different from the traditional one just described. 

I concur in this opinion and I believe that no useful purpose would 
be served by drilling students in the traditional procedures. 

Throughout the book the student is encouraged and stimulated to 
master fundamental principles and concepts. Essentially, the job 
of every statistician is to take hold of situations and disentangle 
them by the techniques of the science. Therefore, considerable 
emphasis is placed on technique. I have tried to develop in the 
student the ability to use symbolism creatively as a language. 
Numerous examples are given to clarify concepts and illustrate 
processes. Over two hundred exercises are included. It is intended 
that these exercises should be handled as in a mathematics course. 
No laboratory, so-called, is necessary. 

Nowadays, no little importance is attached to motivation. I have 
constantly held in mind the necessity of making the subject interest¬ 
ing and stimulating to the beginning student. Nevertheless, I ven¬ 
ture the opinion that the best motivation for intelligent students is 
the feeling that their teacher knows his subject. 

In preparing the manuscript a large number of books and papers 
have been examined and perhaps leaned upon. No claim to origi¬ 
nality is made except possibly in the matter of arrangement and 
pedagogical approach. Numerous references to the scholarly achieve¬ 
ments of others are cited. It is hoped that the serious student will 
read some of these and thereby widen his perspective and enhance 
his interest. 

In conclusion, I wish to express my deep appreciation to Professor 
Allen T. Craig and Dr. Mason E. Wescott who critically read the 
manuscript and made many suggestions for its improvement. 


Evanston, Illinois. 
April, 1939 


John F. Kenney 
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MATHEMATICS OF STATI^tt^ 

INTRODUCTION 

1. Definition. The word statistics is used in at least two different 
senses. Construed as plural it refers to the systematic presentation 
of quantitative data. Used in a singular sense, the word statistics 
refers to the science whicL has for its object the classification and 
analysis of quantitative data so that intelligent judgments may be 
passed upon them.. , 

It is usually clear from the context which meaning^ is intended, 
although some persons prefer the expression statistical methods^ 
for this second meaning. Statistical methods are all those devices 
used in the collection and analysis of data. The theory of statis¬ 
tics is the exposition of statistical methods and is of a mathematical 
nature. 

2. Scope. There used to be a widespread misapprehension that 
statistics is a branch of economics. As a matter of fact, statistical 
problems arise in many different fields — biology, economics, engi¬ 
neering, insurance, education, physics, and astronomy, as well as 
various branches of business. The exploration of certain aspects of 
nearly every field involves some phase of statistical theory. Indeed, 
certain types of statistical methodology may have almost unexpected 
applications — the 'discovery, for example, that the life of physical 
property^ is governed by much the same statistical rules as govern 
the lives of human beings, and hence, that life tables may be applied 
to both. Physicists have discovered that many of the problems in 
the modern theory of the structure of the atom are essentially sta¬ 
tistical in nature. In recent years industrial companies have placed 
an increasing reliance on statistical methods in controlling the 
quality of goods during manufacture. 

Statistics as a science is making contributions to all the sciences. 
On the other hand, some sciences like biometry and physics have 

' In addition to the two meanings given above, another has crept into the 
recent literature where reference is made a statistic. This term will be ex¬ 
plained later. 

* Ldfe Expectancy of Physical Property — E. B. Kurtz. Ronald Press. 
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contributed much in the development of statistics and its terminology. 
The following quotation from Science may appropriately be men¬ 
tioned here: 

The extension of »fhe scope of quantitative methods through the medium of 
statistical analysis is one of the most significant things going on in the scientific 
world at the present time.^ 

The importance of statistical method in present-day thinking has 
been well stated, as follows: 

More and more the modern temper relics upon statistical method in its at¬ 
tempts to understand and to chart the workings of the world in which we live. 
Particularly in those sciences which deal with human beings, whether in their 
physical and biological aspects or in their social, economic, and psychological 
relations, the spirit of our time asks that its conclusion be based not so much 
upon the distinctive reactions of one or two individuals as upon the observation 
of large numbers of individuals, the measurement of their common likenesses and 
the extent of their diversity. As the data thus gathered from mass phenomena 
become extensive, it becomes imperative to have methods of organization to 
bring the facts within the compass of our understanding, methods of analysis 
to make the essential relations appear out of the mass of detail in which they 
are hidden, and methods of classification and description to facilitate the pres¬ 
entation of the data for the studjr and consideration of other persons. Thus 
statistical method becomes a telescope through which we can study a larger 
terrain than would be accessible to our unaided vision. * 

3. Statistical Methods in the Social Sciences. Because statistics 
is fundamentally the study of aggregates of individuals, rather than 
of individuals, whether these iTidividuals be observations or measure¬ 
ments or persons, it is apparent that statistical methods are essential 
to social studies. Indeed it has been said that it is principally by the 
aid of such methods that these studies may be raised to the rank of 
sciences. 

This particular dependence of social studies upon statistical methods 
is mentioned in a recent book ^ from which we quote the following: 

If, as seems probable, our present uncoordinated large-scale business is to be 
further developed into an efficiently managed instrument of production serving 
the needs of the people, then statistics, together with mathematical economics, 
will emerge among the most important tools of the social sciences. For it is by 

1 Science. January 18, 1929, 

• MiUhermtics and Statistics — Walker. Sixth Yearbook, National Council of 
Teachers of Mathematics. 

* Reprinted by permission from Methods of Statistical Analysis by Davies and 
Crowder, published by John Wiley and Sons, Inc. 
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means of averages, dispersions, coefficients of variability, trends, and regressions, 
as pictured in control charts, that management is able to visualize and direct the 
movements of large masses of population. 

The work of the statistician is much like that of the map maker who presents 
the traveler with a sketch of important highways, showing the locations of towns 
and geographical features. The map is not a picture of reality. It shows cities 
as dots, and rivers as lines. It has purposely omitted the interesting details of 
scenery and the still more important features of human interest which lie along 
the route and which constitute the traveler's real objectives. Nevertheless, as 
a means of reaching these objectives, the map is extremely useful. And so it is 
with statistics in the hands of the business executive and statesman. Back of the 
charts are human beings with their varying characteristics and vital interests, 
few of which can be desca ibed in figures. Yet as a means of serving these interests, 
of keeping trade moving from one region to another, of allocating investment and 
labor, and of apportioning relief to maladjusted industries and dependent classes, 
statistics and mathematical methods are important, and are becoming increas¬ 
ingly important with the growing complexity of society. 

It may be said that the study of statistics is not merely an attempt to de¬ 
scribe what actually occurs, though it must begin at this point, but in its broader 
aspects it is the logical background of business and social management. Hence 
what appears now to be mere al:)straction may later become the basic necessity 
of an applied science. Eventually, it may be assumed, the social arts of business 
and politics will rest upon as substantial a theoretical and mathematical back¬ 
ground as physics, chemistry, and engineering, 

4. Mathematics and Statistics, Statistical problems are of inter¬ 
est, therefore, not only to the worker in the particular field but also 
to the mathematician, inasmuch as methods adequate to the treat¬ 
ment of these problems can best be presented in the precise and 
accurate language of mathematics. Moreover, statistical methods 
are grounded in statistical theory which is a branch of applied mathe¬ 
matics. 

Although it is true that some statistical problems are ultimately 
problems in advanced mathematics, many of which mathematicians 
have not yet been able to solve, nevertheless a large and interesting 
part of statistical analysis requires mathematics no more advanced 
than elementary algebra. 

It has been said that sooner or later every true science tends to 
become mathematical. The notation of mathematics is simply a 
language and it is not limited to any particular field of knowledge. 
The following quotations are inserted to help the student approach 
the study of statistics in the proper spirit. 

1. Mathematics, the science of the ideal, becomes the means of investigating, 
understanding, and making known the world of the real. — White. 
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2. Probably among all the pursuits of the university, mathematics preemi¬ 
nently demands self-denial, patience, and perseverance.... — Todhunter. 

3. From time immemorial, there has been but one way to become a mathe¬ 
matician and there will never be another: it is a way interior to the subject and 
involves years of assiduous toil. Short-cuts to mathematical scholarship there 
are none, whether the seeker be a philosopher or a king. — Keyser. 

4. Will is the creative force. Without the will to learn there is no learning. 
And when the will is feeble and confused, learning lags. — Mursell. 

6. The theory of statistics is not easy, not so much because it is abstruse, as 
because the ideas are new to most people, and a good deal of hard thinking and 
patient work will be necessary.... Statistical work always involves a lot of 
computing [andl there is no better way of learning statistics than by working 
through examples. — Tippett. 

6. Problem Assignments. The student should realize at the out¬ 
set that statistical methods are not substitutes for thinking but are 
aids and supplements to it. A superficial knowledge of statistical 
technique cannot take the place of good judgment. Mere ability to 
substitute in formulas should not be confused with genuine statistical 
sophistication and insight. To the serious and capable student who 
intends to master this course, formulas will be a set of functioning 
concepts and tools rather than machines into which material may be 
fed to grind out a meaningless answer. 

This opportunity is also taken to point out that even mathemat¬ 
ical discourse consists of sentences. Punctuation should not be 
omitted in sequences of equations and other mathematical state¬ 
ments. (It is admitted, however, that many of us find this difficult 
to remember.) 

Throughout the book exercises are inserted to give the student an 
opportunity to test his knowledge of the theory and methodology, 
and to develop his power of analysis. In grading the solutions, value 
will be attached to accuracy, thoroughness, neatness, and systematic 
arrangement of the work. 

6. Calculating Machines.^ A full description of the parts of a cal¬ 
culating machine and their operation may be obtained from an /n- 
structim Book which is furnished by the manufacturer, so only a 
brief description will be given here. 

A calculating machine is constructed to add and substract. By 
means of continued addition or subtraction, operations involving 
multiplication, division, and square root can also be performed with 
great speed. 

^ The early history of modem computing machines is outlined in the American 
Mathematical MorUhly, vol. 31 (1924), pp, 422-429. 
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In addition to a keyboard on which numbers can be punched, most 
machines have a sliding carriage, carrying two dials one above the 
other. These dials are called revolution register (upper dial) and 
product register (lower dial). In finding a product nXj one of the 
factors n is punched on the keyboard and as the motive crank at the 
side is turned,^ the other factor x appears on the upper dial. The 
product nx is then read from the lower dial. 

An important property of the modern calculating machine is its 
adaptability to short cuts and combinations of operations. For 
example, one may multiply two numbers nx together and add the 
result to a third number k without tabulating the intermediate steps. 
This is accomplished by punching the number k on the keyboard, 
transferring it to the lower dial (product register), and then proceed¬ 
ing as in finding the product nx. The result na: + /r is then read 
from the lower dial. An extension of this procedure is especially 
useful in a series of computations where k and n are constant and 
various values are assigned to x. To describe the procedure, sup¬ 
pose it is required to calculate the successive values of 12 + 6a; for 
a; = 5, 7, 15, 12, etc. The number fr = 12 is first registered on the 
lower dial, then the factor n == 6 is placed on the keyboard, and by 
turning the crank forward five times to make the first value of a; = 
5 appear on the upper dial, the result 12 + 6 X 5 appears on the 
lower dial. Instead of clearing the dial, the crank is now turned 
forward twice more to rebuild the value x = 5 into a; = 7, and the 
result 12 + 6 X 7 can be read from the lower dial. In rebuilding 
a; = 15 into a; = 12 the crank is turned backwards. This procedure 
can be repeated until all the required values of 12 + 6x have been 
calculated. A process of this sort is called the continuous method of 
calculating. 

In most of the exercises in this course, the computations are not 
laborious and calculating machines are not required. However, if 
machines are available they may be used to advantage in Chapters 
IV and VI. The student who desires to develop skill on a calculat¬ 
ing machine should begin now to study an Instruction Book and 
practice the fundamental operations explained there. 

7. Collateral Reading. Perhaps no single textbook can meet all 
the needs of all students of statistics. There are several good books 
on elementary statistics which, although not fundamentally different, 

' The beginner will probably wish to practice on a manually operated machine 
before attempting to use the high-speed electric and automatic machines. 
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present different points of view on certain topics and treat them with 
varying degrees of emphasis depending upon the field of major inter¬ 
est. At least some of the books listed below should be readily avail¬ 
able on the reserve shelf of the library. The list should be useful to 
those who wish to study more fully certain details in which they may 
be interested. 

1. Bivins — The Ratio Chart in Business, Codex Book Co. 

2. Burgess — The Mathematics of Statistics, Houghton Mifflin and Co. 

3. Camp — The Mathematical Part of Elementary Statistics, D. C. Heath 
and Co. 

4. Deming — Statistical Adjustment of Data, John Wiley & Sons, Inc. 

6. Freeman — Industrial Statistics, Wiley. 

6. Garrett — Statistics in Psychology and Education, Longmans, Green 
and Co. 

7. Glover— Tables of Applied Mathematics. Wahr. 

8. Haskell — Graphic Charts in Business. Codex Book Co. 

9. Mills — Statistical Methods^ Revised. Henry Holt and Co. 

10. Pearl — Medical Biometry and Statistics, W. B. Saunders and Co. 

11. Rider — Statistical Methods, Wiley. 

12. Scarborough — Numerical Mathematical Analysis, The Johns Hopkins 
Press. 

13. Snedecor — Statistical Methods. Collegiate Press, Inc., Ames, Iowa. 

14. Treloar — Statistical Reasoning, Wiley. 

16. Walker — Elementary Statistical Methods. Holt. 

16. Yule and Kendall — The Theory of Statistics, Griffin and Co. 



CHAPTER I 

FREQUENCY DISTRIBUTIONS 

1. Variables and Constants. A variable is a number symbol 
which may take on any value in a set of values which is called its 
range, A constant is a symbol whose range consists of only one value 
(in a particular discussion or situation). Letters toward the end of 
the alphabet, such as x, u, and Vy are commonly used to denote 
variables. When a constant does not have a definite value such 
as 3, TT, and so forth, it is customary to represent the constant by a 
letter toward the beginning of the alphabet. 

Two famous constants are 

TT = 3.14159 6= 2.71828... 

They occur in mathematics in many important, interesting, and 
even curious ways. As instances of the latter, the following ex¬ 
amples are noteworthy. 

e = 2 + ^ + ^ + ^H-, where nl = n(n - l)(n - 2) • • • 1. 

TT 1 


The expression for e is called a convergent infinite series and that for 
7r/4 a continued fraction. 

2. Variates. In general, statistical data are obtained by taking 
observations or measurements on one or more variables. The values 
thus obtained are sometimes called variates,^ For example, in con^ 
puting the average monthly rainfall o£^a region the variable is rain- 
falT and tne amount of rainfainor any month js a vanSe. Like- 

^ A somewhat different usage of this term is explained in Part II. 
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wise, if the bank clearings of the city of Madison are under considera¬ 
tion, then the variable is bank clearings, and the clearings for any 
specified interval are variates. If we denote a variable by x then 
the N values which it takes on are denoted by Xi, Xi, • • •, xif. 

Variates are of two kinds: continuous and discrete. Continuous 
variates are values of a variable which, theoretically, can be meas¬ 
ured to any degree of fineness, such as heights, weights, temperatures, 
ages. All the numbers between x = 0 and x = 1 form a set of con¬ 
tinuous variates. But if we restrict x to the rational numbers in 
this interval we have a set of separate and distinct values with 
“vacant” spaces between them. Values of a variable which are 
thus restricted to particular values in order to have any meaning 
are called discrete variates. Other examples of discrete variates are: 
size of families, closing prices of stocks, “ successes ” in tossing a coin. 
A set of discrete variates is usually obtained by counting whereas 
continuous variates are usually obtained by measurement. 

3. Accuracy of Measturements. In the case of continuous vari¬ 
ates, the observed values as recorded can never be absolutely estab¬ 
lished by measurement. Thus, the height or weight of an object can 
be measured only approximately, the error depending upon the pre¬ 
cision of the instrument and the care and accuracy of the observer. 
However, it is not always necessary that measurements be recorded 
as accurately as it is possible to make them. Similarly, in the case 
of discrete variates the standard of accuracy used may he less 
than it is possible to obtain. In population statistics, for example, 
it may be sufficient to record the numbers to the nearest 
thousand, with three zeros at the end to fill out to the decimal point. 
Thus, 

City Population 

A 326,000 

B 729,000 

On the other hand, the exact number of students in a university 
might be required. The degree of accuracy needed is determined 
by the purpose of the investigation and it is limited by the closeness ' 
with which the variables can be measured. 

L It follows, therefore, that the degree of accuracy in the final result 
of a problem involving computations is limited by that of the original 
mta. Students sometimes carry results of problems to five or more 
decimal places when the original data do not justify more than two 
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places. table of measurements which constitutes 
the raw data for a statistical investigation should always specify the 
degree of accuracy in the readings. Thus, if monthly rainfall is being 
measured to the nearest hundredth of an inch, and one measurement 
seems to be exactly 5 inches, it should be recorded as 5.00 inches, with 
two zeros. A measurement that is merely recorded as 5 means it is 
correct to the nearest integer and its true value lies JK^ween 4.5 and 
5.5, whereas 5.00 means the true value is known to lie between 4.995 
and 5.005. The three digits in 5.00 are said to be significant. 

4. Necessity for Classification. After the data have been col- 
^lected in any statistical investigation the first step has to do with 
.' introducing order in the raw material. Usually we have some hun¬ 
dreds of variates which have been recorded merely in the arbitrary 
order in which the observations or measurements happened to be 
made. But in order to analyze a series of variates so that intelligent 
judgments may be formed about it or that comparisons may be made 
between two series of variates, proper classification is necessary and 
of prime importance.; 

Such classification is not always an easy thing to effect, because it 
is the one part of statistical methods for which no very definite rules 
can be given. Most people, until they have tried, imagine that to 
collect and arrange data in classes and in tables is a straightforward 
procedure involving no great technique or experience. Although 
much can be learned from a careful study of the illustrations and dis¬ 
cussions that appear in the following pages and the compilations of 
reputable bureaus such as the census volumes, nevertheless, experi^ 
ence is the best teacher in effecting the most appropriate classification 
for any set of variates. 

6. Tabulation. In carrying out the process of classification, it 
becomes natural to arrange the results in tabular form, setting forth 
clearly and explicitly the statistics one wishes to present. In draw¬ 
ing up any table the following general rules should be observed: 

(1) Every table must be self-explanatory. To accomplish this 
the title should be short, but not at the expense of clearness. 

(2) Full explanatory notes, when necessary, should be incorporated 
in the table, either directly under the descriptive title and 
before the body of the table, or else directly under the form. 

(3) The columns and rows should be arranged in a logical order to 
facilitate comparisons. 
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(4) In tabulating long columns of figures, spaces should be left 
after every five or ten rows. Long unbroken columns are con¬ 
fusing, especially when one is comparing two numbers in a 
row but in widely separated columns. 

(5) If the numbers tabulated have more than three significant 
figures, the digits should be grouped in threes. Thus, one 
should write 4 685 732, not 4685732. 

(6) Double lines at the top (or at the top and bottom) may en¬ 
hance the effectiveness of a table. If the table nicely fills 
the width of the page, no side lines should be used. In such 
cases the omission of the side lines will have the tendency to 
emphasize the other vertical lines and cause the interior col¬ 
umns to stand out better. The columns should not be widely 
separated and the form of a narrow, compact table should 
have its side lines. 

The following points are particularlyjmportant in practical work: 

(7) Source of data should be ipcluded. 

(8) Units of the data presented should be clear. 

(9) Accuracy of transcription must not only be striven for but 
actually achieved. A reader who finds one error (even though 
this be the only one) is likely to disparage the whole table. 


Table 1 — Grades of 100 Students in Freshman Mathematics 
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6. Frequency Distribution. From the standpoint of a mathemati¬ 
cal analysis of statistics, the most important form of tabulation is 
the so-called frequency distribution. Rough data do not present 
any clear ideas of description unless they are organized and condensed 
in a systematic way. We therefore partition the raw data into 
dosses of appropriate size, showing the corresponding frequency of 
variates in each class. When any set of statistics is systematically 
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arranged in this way it is called a frequency distribution. For ex¬ 
ample, upon an examination of the raw data of Table 1, it is diflBcult 
to state any very definite conclusions as to whether these grades rep¬ 
resent preponderantly good students or poor ones. The frequency 
distribution of Table 2, however, does give us more precise infor- 


Table 2 — Frequency Table op 100 Grades 


Class Limits 

Tally Marks 

Frequency 


30-39 

// 

2 


40-49 

/// 

3 


50-59 

-UH- / 

II 


60-69 


20 


70-79 

viVA -/Ht // 

32 


60-69 

-mf -fm- 

25 


90-99 

// 

7 


Total 


too 

_ 


mation. We see at a glance that there were 32 students with grades 
between 70 and 80, and that all but 16 had grades of 60 or above. In 
Table 3, the confusion of detail is still more apparent. The corre¬ 
sponding frequency distribution is given in Table 4. 

The width of a class is called the class interval, and in general 
the successive class intervals should be of equal width. The mid¬ 
value of such an interval is variously called the class mark, mid¬ 
value, central value. The width of a class interval is therefore 
seen to be the common difference between two consecutive class 
marks. It is also the difference between the lower (or upper) 
limit of two successive classes. Thus, in Table 4, the class inter¬ 
val is half an inch and the successive class marks are 0.245, 0.745, 
etc., inches. 

7. Class Intervals. Grouping variates into the most appropriate 
number of classes is a matter of judgment. The choice of intervals 
to be used in tabulating any particular set of variates depends upon 
the nature and characteristics of the data and the purpose for wliich 
it is to be used. In the case of discrete variates, the unit is a natural 
interval and sometimes it is satisfactory. (See Tables 10 and 11.) 
However, for both discrete and continuous variates the following 
conditions should guide the choice: (a) We desire to be able to treat 
all the values assigned to any one class, without serious error, as if 
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Table 3 — Monthly Rainfall at Iowa City, 1890-1925 
Fear Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. 

1890 2.76 0.75 1.80 1.83 2.20 7.99 0.30 2.29 1.44 2.11 1.56 0.31 

1891 1.49 1.30 4.41 1.11 4.46 2.80 3.01 3.46 2.33 1.63 2.93 2.72 

1892 1.46 1.23 3.16 4.30 9.23 8.29 6.20 2.60 1.18 1.02 1.38 2.84 

1893 1.18 1.75 2.82 4.37 1.79 3.01 3.56 1.64 3.07 1.98 1.75 1.52 

1894 1.95 1.64 2.03 2.72 3.09 2.40 0.90 2.40 4.96 2.30 1.80 0.98 

1896 2.37 0.64 1.25 1.66 4.26 1.10 10.10 1.77 3.43 1.38 1.78 2.84 

1896 0.70 1.51 0.92 5.14 4.10 1.86 7.04 2.44 1.82 2.74 1.16 0.55 

1897 3.66 1.30 2.07 4.60 3.11 2.38 3.83 1.85 3.54 0 .33 1.98 2.48 

1898 4.62 1.15 3.02 2.89 4.80 3.26 2.27 2.85 2.54 4.38 1.10 0.53 

1899 0.59 1.82 1.43 3.23 9.49 4.60 3.78 2.39 0.93 1.66 1.15 1.93 

1900 0.73 2.20 3.32 3.31 4.31 2.18 5.25 6.27 4.35 3.61 1.43 0.75 

1901 1,07 1.97 3.62 2.36 1.54 3.33 1.29 0.66 2.56 1.78 0.79 2.34 

1902 1.29 0.85 1.29 1.91 3.75 7.46 6.89 10.91 5.87 3.12 2.25 2.21 

1903 0.67 1.03 1.86 3.11 6.90 1.95 4.76 3.45 5.38 3.60 0.97 1.27 

1904 1.74 0.84 2.73 5.49 2.68 2.14 2.49 3.93 3.12 1.59 0.25 1.96 

1905 1.22 1.90 2.28 3,36 5.37 6.68 3.59 2.62 1.54 5.36 2.92 1.04 

1906 2.51 1.73 2.25 1.83 2.33 3.64 1.42 5.34 0.89 1.48 3.08 1.64 

1907 2.12 0.22 1.59 1.58 5.47 6.04 9.21 2.98 2.85 0.86 1.07 0.53 

1908 0.32 2.08 2.94 2.78 7.78 2.87 5.40 7.47 1.82 1.99 1.84 0.43 

1909 1.97 1.09 2.00 7.21 4.40 4.58 5.75 1.88 2.43 1.59 4.88 2.52 

1910 1.79 0.39 0.28 2.66 3.57 0.98 2.22 4.98 3.87 0.57 0.69 0.46 

1911 0.87 4.82 1.30 3.02 4.74 2.98 3.70 4.27 5.07 2.78 3.01 2.29 

1912 0.26 1.21 2.30 3.60 2.88 2.60 3.60 3.62 2.67 3.54 1.11 0.75 

1913 1.19 1.42 2.69 1.83 6.91 6.28 0.39 2.97 3.19 3.66 0.46 1.02 

1914 1.28 0.93 2.63 2.37 4.87 5.32 1.53 2.99 7.97 1.65 0.37 1.89 

1916 2.16 2.42 0.92 0.65 7.65 4.33 8.11 1.80 9.31 1.84 1.80 0.80 

1916 3.18 0.59 5.06 1.83 5.99 3.92 1.67 2.83 3.49 3.19 1.42 1.15 

1917 1.09 0.19 2.19 3.43 7.33 6.49 2.84 2.79 6.23 2,28 0.30 0.57 

1918 1.10 1.46 0.33 3.43 6.22 8.36 4.87 6.72 2.00 2.05 2.10 1.62 

1919 0.08 2.63 2.65 4.28 4.49 7.07 1.03 2.67 6.10 4.01 3.84 0.61 

1920 0.84 1.33 4.22 4.75 3.76 2.86 2.79 2.90 1.20 0.98 1.80 2.45 

1921 0.35 0.49 2.46 6.20 4.44 2.46 3.59 8.61 7.83 2.47 0.74 3.19 

1922 1.11 1.46 2.18 3.49 5.52 0.28 6.46 1.03 2.91 1.06 5.28 0.49 

1923 1.09 0.67 4.83 0.86 2.63 6.21 2.37 4.01 9.27 2.35 1.13 0.73 

1924 1.35 0.83 2.10 1.09 1.69 8.71 3.67 5.67 2.60 1.64 0.93 1.75 

1925 0.29 1.04 0.99 3.07 1.06 6.61 3.63 3.14 5.69 3.90 1.00 1.66 
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they were equal to the class mark for that interval; e.g., as if all 
23 items in the first class of Table 4 were exactly 0.245 inches, etc. 
(b) For convenience and brevity we desire to make the interval as 
large as possible subject to the first condition. These conditions will 
generally be fulfilled if the interval is so chosen that the whole num- 

Tablb 4 — Frequency Table op Monthly Rainfall at Iowa Citt, 

1890-1925 


Class Interval 

Mid-x 

Frequency 

0.00- 0.49 

0.245 

23 

0.50- 0.99 

0.745 

42 

1.00- 1.49 

1.245 

58 

1.50- 1.99 

1.745 

62 

2.00- 2.49 

2.245 

49 

2.50- 2.99 

2.745 

47 

3.00- 3.49 

3.245 

32 

3.50- 3.99 

3.745 

27 

4.00- 4.49 

4.245 

18 

4.50- 4.99 

4.745 

15 

6.00- 5.49 

5.245 

14 

6.50- 5.99 

5.745 

7 

6.00- 6.49 

6.245 

10 

6.50- 6.99 

6.745 

6 

7.00- 7.49 

7.245 

6 

7.50- 7.99 

7.745 

5 

8.00- 8.49 

8.245 

3 

8.50- 8.99 

8.745 

2 

9.00- 9.49 

9.245 

5 

9.50- 9.99 

9.745 

0 

10.00-10.49 

10.245 

1 

10.50-10.99 

10.745 

1 

Total 


432 


ber of classes lies between 10 and 25. A small number of classes 
may cover up ” too much detail whereas a large number may 
reveal too much detail for one to comprehend readily (which is 
just the objection to the table of original data). A preliminary 
inspection of the data should accordingly be made and the highest 
and lowest values selected. Dividing the difference between these 
by the tentative number of classes, we have our approximate value 
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Table 5 — Monthly Rainfall at Des Moines, 1890-1925 


I 


Year 

Jan, 

Feh. 

Mar. 

Ajyr. 

May June 

July Aug. Sept. 

Oct. 

Nov. 

Dec. 

1890 

2.62 

1.17 

0.91 

0.78 

3.00 

4.91 

1.10 

3.35 

1.57 

4.48 

0.74 

0.11 

1891 

1.82 

1.13 

2.25 

2.12 

3.29 

5.60 

2.78 

4.22 

1.64 

2.41 

1.34 

1.54 

1892 

1.60 

1.35 

2.47 

3.36 

8.77 

3.41 

8.64 

2.45 

1.12 

2.54 

0.76 

1.95 

1893 

0.56 

1.28 

1.15 

5.61 

2.84 

4.69 

3.55 

1.60 

1.33 

0.22 

1.51 

1.30 

1894 

1.09 

1.39 

1.78 

1.70 

1.41 

1.67 

0.29 

1.89 

4.46 

2.24 

0.99 

1.15 

1895 

1.30 

0.60 

0.50 

3.41 

2.86 

5.26 

3.10 

3.57 

3.20 

0.29 

0.85 

1.86 

1896 

0.60 

0.79 

1.24 

3.47 

6.50 

2.69 

8.15 

5.49 

3.61 

2.69 

1.10 

0.85 

1897 

2.02 

0.71 

2.13 

7.37 

2.31 

3.15 

2.88 

1.77 

1.56 

0.85 

0.34 

1.98 

1898 

1.59 

0.82 

1.35 

2.64 

4.22 

6.85 

1.86 

1.09 

1.91 

3.56 

1.87 

0.57 

1899 

0.29 

0.57 

1.04 

2.22 

6.71 

3.53 

3.20 

3.53 

1.17 

0.59 

1.76 

2.12 

1900 

0.20 

0.50 

3.07 

3.82 

4.76 

4.89 

5.15 

8.02 

3.66 

3.08 

0.96 

0.35 

1901 

1.01 

1.11 

3.02 

2.26 

1.40 

2.41 

1.72 

0.67 

2.60 

2.14 

0.40 

1.03 

1902 

0.91 

0.52 

1.15 

1.55 

4.69 

7.27 

5.95 

7.82 

5.03 

3.70 

1.65 

1.77 

1903 

0.20 

1.12 

1.09 

1.64 

0.64 

3.06 

3.62 

6.72 

1.62 

1.32 

0.31 

0.09 

1904 

1.22 

0.22 

1.20 

5.48 

3.16 

2.08 

6.94 

2.60 

1.95 

1.50 

0.06 

2.02 

1905 

1.08 

1.00 

2.16 

3.29 

4.44 

5.73 

4.53 

5.21 

3.47 

3.64 

2.34 

0.55 

1906 

2.07 

0.86 

1.84 

2.96 

2.21 

3.80 

2.67 

4.69 

3.24 

1.18 

2.29 

1.46 

1907 

0.87 

0.93 

1.18 

1.48 

2.97 

4.13 

10.20 

5.03 

2.40 

1.70 

1.12 

1.01 

1908 

0.46 

1.15 

1.43 

2.69 

9.89 

5.93 

1.56 

6.54 

0.94 

3.68 

0.95 

0.31 

1909 

1.61 

0.90 

1.56 

5.14 

4.24 

7.01 

4.41 

0.14 

2.06 

2.89 

3.71 

2.32 

1910 

1.72 

0.20 

0.33 

1.13 

3.26 

3.11 

0.86. 

2.40 

3.82 

0.68 

0.53 

0.20 

1911 

0.84 

2.91 

1.14 

4.23 

2.44 

0.75 

1.16 

1.82 

7.68 

2.61 

1.22 

3.18 

1912 

0.53 

1.86 

2.87 

2.75 

5.62 

2.60 

3.07 

3.52 

4.20 

3.75 

1.11 

0.30 

1913 

1.10 

0.65 

3.03 

3.41 

5.06 

3.52 

1.05 

3.44 

2.65 

2.67 

1.03 

1.05 

1914 

0.85 

1.24 

1.18 

1.52 

4.83 

3.89 

1.22 

1.77 

4.81 

3.57 

0.35 

1.28 

1915 

1.96 

3.20 

1.16 

1.36 

8.21 

3.60 

9.39 

1.71 

4.51 

0.43 

1.24 

0.65 

1916 

2.66 

0.61 

0.60 

2.44 

3.87 

2.42 

1.50 

2.62 

1.72 

2.11 

1.46 

0.65 

1917 

6.53 

0.52 

2.30 

5.52 

3.94 

8:16 

1.58 

1.82 

1.99 

0.92 

0.21 

0.88 

1918 

0.78 

1.45 

0.29 

1.81 

5.87 

5.63 

1.18 

2.54 

0.91 

3.81 

2.10 

1.35 

1919 

0.08 

3.00 

3.67 

5.30 

2.96 

7.36 

2.68 

2.19 

7.47 

2.20 

3.84 

0.93 

1920 

0.44 

0.74 

3.92 

4.09 

3.14 

1.25 

5.66 

2.11 

4.44 

1.89 

1.63 

1.38 

1921 

0.59 

0.92 

1.07 

3.72 

3.62 

4.66 

2.49 

6.63 

7.16 

1.51 

0.35 

0.80 

1922 

0.85 

0.64 

2.25 

2.84 

6.87 

1.63 

7.13 

6.63 

3.00 

3.41 

2.54 

0.25 

1923 

0.88 

0.36 

4.34 

1.76 

4.78 

4.95 

0.78 

5.34 

5.17 

1.10 

0.55 

0.61 

1924 

1.02 

1.98 

3.10 

0.78 

1.26 

9.30 

0.98 

4.15 

3.47 

0.77 

0.53 

1.62 

1925 

0.23 

0.50 

0.88 

1.64 

0.77 

6.40 

2.21 

4.79 

3.75 

3.22 

0.32 

1.67 
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of the interval. After a little preliminary reconnoitering an appro¬ 
priate number of classes and their limits can be determined. Thus, 
in Table 3, the highest value noted was 10.91 and the lowest 0.08 
(verify). The difference between these is 10.83, which suggests that 
if we took 20 classes we would have approximately a half inch as the 
width of a class interval. This, however, assumes we would start 
with 0.08 as our lower limit, which would give us awkward figures as 
limits. Therefore, our judgment suggests it would be better to start 
with 0 and continue by half-inch intervals as far as is necessary to 
take in the range of the given variates. We have estimated it will 
take approximatey 20 of these; actually it turns out to be 22. This 
number of intervals and their width is consistent with the general 
conditions (a) and (b) given above. On page 16 are given some 
supplementary rules which in general are helpful in making a fre¬ 
quency distribution. 

8. Distinction between Class Limits and Class Boundaries. The 

pairs of numbers written in the column of classes of a frequency dis¬ 
tribution are the lower and upper class limits, sometimes called open 
class limits. For instance, 1.00-1.49 arti the limits of the third class 
of Table 4; When the measuremehts of Table 3 were made, readings 
were recorded to the nearest hundredth of an inch. Thus, a measure¬ 
ment which was more than 1.^185 and less than 1.495 was recorded 
as 1.49. Likewise, if a measurement w^as more than 0.995 but less 
than 1.005, it would be recorded as 1.00. Therefore^ tHe third class 
of Table 4 includes all measurements more than 0.995 and less than 
1.495. These values are then the true or closed limits of the third 
class and are known as class boundaries or end values. A class bound¬ 
ary is the value halfway between the upper limit of one c lass and the 
lower limit of the next cla^ For example, the upper boundary of 
the tduHK"^cIass^ 4 is 1.995 which is the lower boundary 

of the fifth class'. If we denote the variate values by x, the 
following table illustrates these remarks for the first five classes of 
Table 4. 


Class Limits 

End-x 

Mid-x 

0.00-0.49 

0.495 

0.245 

0.50-0.99 

0.995 

0.745 

1.00-1.49 

1.495 

1.245 

1.50-1.99 

1.995 

1.745 

2.00-2.49 

2.495 

2.245 


The width of a class interval is the same, however, whether the 
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classes are expressed in terms of class limits or class boundaries, being 
the difference between the beginning of one class and the begin n ing 
of the next class. Similarly, the class mark as the mid-point of the 
interval is unaffected. Thus, for the class limits 1.00-1.49, the 
class mark is 1(1.00 + 1.49) = 1.245; for the corresponding class 
boundaries, the class mark is §(0.995 + 1.495) = 1.245. 

The distinction between class limits and class boundaries is an 
important one in plotting graphs, but in tabulating it is the class 
limits that should be expressed. 

9. Rules for Making a Frequency Distribution. 

(1) Determine the range of the table by finding the difference be¬ 
tween the highest value and the lowest value among the items. 

(2) Determine the number of equal parts into which the range 
shall be divided. The size of the class interval and the num¬ 
ber of intervals depend upon the size and nature of the distri¬ 
bution. (Table 1 contains rather fewer classes than is usually 
desirable but an interval of 10 units is quite conventional in 
students’ grades. An interval of 5 would be used if grades 
of A, A—, B, B—, etc., were given instead of A, B, etc.) In¬ 
tervals of 0.5, 1, 2, 3, 5, 7, or 10 are the most common. 

(3) Arrange a sheet with three headings: class interval, tally 
marks, frequency. 

(4) Read off the items in the raw table and for each one record a 
mark, as shown in Table 2. 

(6) Write the sum of the marks in each row in the frequency col¬ 
umn. The sum of the frequencies should, of course, equal the 
total number of variates. 

10. Cumulative Frequencies. The frequencies with which we have 
been concerned may be called absolute frequencies to distinguish 
them from two other kinds which will be mentioned in this course; 
namely, cumulative frequencies and relative frequencies. The first 
of these will be considered here. 

Sometimes a statistical investigation is concerned with the number 
or percentage of variates which are “ less than ” or “ more than ” 
a given value. This is frequently the case in educational tests and 
in wage or salary statistics. Our chief interest in such cases may be 
the accumulated frequency of the several class intervals up to some 
class boundary. Hence we are led to form a cumulative frequency 
table. Such a table is built up by successively adding the several 
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(absolute) frequencies; thus: fi,fi +/*,/i +f»+f 3 , etc., as illus¬ 
trated in Table 7, where the data of Table 6 are used. We shall use 
N to denote the sum of all the frequencies. 


Table 6 — Distribution of Intelligence Quotients (IQ’s) of 90S School 
Children from 6 to 14 Years of Age. (Derived from 
L. M. Term AN, TAe Measurement of Intelligence) 


IQ 

Number 

65- 64 

3 

65- 74 

21 

76- 84 

78 

85- 94 

182 

95-104 

305 

106-114 

209 

115-124 

81 

125-134 

21 

135-144 

5 


The cumulative frequency (cum f) at any class ^ the total (abso¬ 
lute) frequency up to the upper boundary of that class. This is the 
reason for placing the cum f entries opposite the end-x values and on 
lines between the mid-x entries. Thus, in the cumf column of Table 
7, three students had IQ’s less than 64.5, 24 less than 74.5, etc. The 


Table 7 — Cumulative Distribution of IQ’s (Table 6) 


Class Mark 
Mid-x 

Frequency 

/ 

Upper Boundary 
End-x 

Cumf 

Cumf 

N 

59.5 

3 =/i 

54.5 

0 

0.000 

69.5 

21 =/2 

64.5 

3 =/i 


79.5 

78 

74.5 

24=/i+/, 


89.5 

182 

84.5 

102 


99.5 

305 

94.5 

284 


109.5 

209 

104.5 

589 


119.5 

81 

114.5 

798 


129.5 

21 

124.5 

879 


139.5 

5 

134.5 

900 

0.994 


144.5 

905 = N 

1.000 


entries in the column headed (cum f)/N give the percentages of the 
total frequency which are less than the values of the end-x column. 
Thus, from this column in Table 7, we can readily see that 88% of 
the children had IQ’s less than 114.5 and only 11% less than 84.5. 
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Table 7 is known as a less than table. One could of course 
cumulate the frequencies from the bottom of the table, getting a 
more than '' distribution. The cum f column would then give the 
number of children whose IQ’s are more than the values at the lower 
boundaries of the several class intervals. 

The inverse operation to cumulating the frequencies is called 
“ differencing ” and is usually denoted by A (delta). If S denotes 
any series of values, then AS denotes the results obtained by sub¬ 
tracting the first value of S from the second value, the second from 
the third, etc. Differencing a column of cumulative frequencies 
obviously gives the absolute frequencies. Differencing a column 
of {cum f)/N values gives the f /N values. 

Vx 

Exercises 

1 . What is the width of the class interval and the values of the class marks 

in Table 2? 

2 . Tabulate the grades of Table 1, using class intervals of 5 units. 

3 . With reference to Table 3, is it easy to answer such questions as the following: 

(a) In how many instances are the monthly rainfall between 2 inches and 
3 inches? 

(b) In how many instances was the rainfall less than 5 inches? 

(c) What was the smallest monthly rainfall recorded? 

(d) What per cent of the total measured between 5 inches and 10 inches? 

(e) What measurement is the most common? 

4 . Refer to Table 4 and then answer the above questions. 

5 . Using your own judgment as to the most appropriate class interval, make 

a frequency distribution of the monthly rainfall for Dos Moines from 
1890 to 1925 (Table 5). 

6. For Table 6 state the class boundaries (end values) and the class marks. 

7. Difference the cum f column of Table 7. 

8. Read the following references: 

(а) Mathematics Essential for Elementary Statistics — Walker, Chapter II. 

(б) Standards and Requirements in Statistics — Belcher. Journal American 
Statistical Association, vol. 21, p. 424. 

11. Additional Distributions. The following distributions which 
will be referred to in subsequent chapters will serve as illustrative 
and laboratory material. They are not chosen on account of the 
importance of the data but merely to exemplify methods. 
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Table 8 — Disteibxjtion of Lengths of 
995 Telephone Calls. Time in Seconds 


Time 

Number of Calls 

0-99 

1 

100-199 

28 

200-299 

88 

300-399 

18C 

400-499 

247 

500-599 

260 

600-699 

133 

700-799 

42 

800-899 

11 

900-999 

5 


(P^or future reference: x = 477.3 secs., a = 148.5 secs.) 

Table 9 — Distribution of Weight in Pounds Among 
1000 8-Year-Oli) Glasgow Schoolgirls 


Weight (niid-values) 

Frequency 

29.5 

1 

33.5 

14 

37.5 

56 

41.5 

172 

45.5 

245 

49.5 

263 

53.5 

156 

57.5 

67 

61.5 

23 

65.5 

3 


Table 10 


Twelve dice were thrown 4096 times; only a throw of 6 was counted a success. 
The observed distribution follows: 


Successes 

Xi 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 


Frequency 

fi 

447 

1145 

1181 

796 

380 

115 

24 

7 

1 

0 

0 

0 

0 


(For future reference: x = 2, a = 1.296) 
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Table 11 

Twelve dice were thrown 4096 times; a throw of 4,5, or 6 points being reckoned 
a success. The following distribution was recorded: 

Successes Frequency 


0 

0 

1 

7 

2 

60 

3 

198 

4 

430 

5 

731 

6 

948 

7 

847 

8 

536 

9 

257 

10 

71 

11 

11 

12 

0 


(For future reference: 2 = 6.139, a = 1.712) 


Table 12 — Frequency Distribution of the Weights op 1000 Male 
Students (Original Measurements Made to Nearest Half Pound) 


Class 

Pounds 

Class 

Mark 

Frequency 

Cumulative 

Frequency 

90- 99.5 

94.75 

2 

2 

100-109.5 

104.75 

21 

23 

110-119.5 

114.75 

104 

127 

120-129.5 

124.75 

196 

323 

130-139.5 

134.75 

248 

571 

140-149.5 

144.75 

197 

768 

150-159.5 

154.75 

133 

901 

160^169.5 

164.75 

47 

948 

170-179.5 

174.75 

25 

973 

180-189.5 

184.75 

14 

987 

190-199.5 

194,75 

7 

994 

200-209.5 

204.75 

4 

998 

210-219.5 

214.75 

0 

998 

220-229.5 

224.75 

0 

998 

230-239.5 

234.75 

1 

999 

240-249.5 

244.75 

1 

1000 


(For future reference: 2 - 138.65, a = 18.03, «« =,94) 
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Table 13 — Distribution op Span (Central Values) in Inches Among 2000 
Adult Males (Original Measurements to the Nearest Inch) 


Span 

Frequency 

Span 

Frequency 

58.5 

1 

71.5 

217 

59.5 

2 

72.5 

176 

60.5 

1 

73.5 

132 

61.6 

6 

74.5 

82 

62,5 

7 

75.5 

48 

63.5 

22 

76.5 

20 

64.5 

55 

77.5 

16 

65.5 

111 

78.5 

12 

66.5 

146 

79.5 

3 

67.5 

182 

80.5 

1 

68.5 

229 

81.5 

2 

69.5 

265 

82.5 

1 

70.5 

263 

Total 

2000 


The following references are recommended to those who desire some distri¬ 
butions which may be more interesting in themselves: 

(а) Per cent Distribution of Deaths in Each Age Period, by Specified Causes. 
White Males and White Females, United States, 1042. Source: Metro¬ 
politan Life Insurance Company, Statistical Bulletin^ October 1945, p. 7. 

(б) Age of American Military Leaders. Source: Metropolitan Life Insur¬ 
ance Company, Statistical Bulletin^ June, July, August, 1945. 

(c) Employment Status of the Population by Age and Sex. Source: Popu- 
latioriy Third Series^ The Labor Force, Table 5, 16th Census. 

(d) Distribution of Population by Age. Source: Statistical Abstract, 1943, 
p. 24. 



CHAPTER II 


GRAPHICAL REPRESENTATION 

1. The Function Concept. Variables which are linked or related 
in some way are encountered in various fields of human experience. 
Several variables may be linked but we shall, for the present, con¬ 
sider the simple case where only two variables are involved. For 
example, the two related variables may be time and population, 
variate and frequency, rate of interest and accumulated principal, 
age and insurance premium. The primary purpose of a graph is to 
show diagrammatically how the values of one of two linked variables 
change with those of the other. One of the most useful applica¬ 
tions of the graph occurs in connection with the representation of 
statistical data. 

Underlying the intelligent use of graphs is the concept of functiouy 
which is a fundamental notion in mathematics and its applications. 
The mathematical meaning of function is a technical one, entirely 
different from the ordinary meaning. The student usually meets 
the word for the first time in algebra, when a linear or quadratic 
expression is spoken of as a function of x. An example is the equation 

y = P{l + x)\ 

The expression on the right is the function of x {P being constant) 
and for convenience it is denoted by the single letter y. Here x is an 
interest rate and y denotes the amount to which P dollars will accu¬ 
mulate in two years at x% per year. 

The statement that 2/ is a function of x is written symbolically in 
the form 

V = /(»). 

This implies that a value of the function y is determined when a value 
is assigned to the variable x. For this reason, x is called the indepen¬ 
dent variable and y the dependent variable. In place of / other letters 
may be used. Thus, any one of the symbols 

g{x), h{x), F{x), 4>{x), 

and so on, denotes a function of x. The same symbol may be used 

22 
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to denote different functions in different problems, but different 
symbols are required to represent different functions in the same 
problem or discussion. 

Examples: 

fix) = 5x2 - ar + 2, 

0(x) = 


Any mathematical expression involving a variable a: is a function 
of X. However, the word is often used to designate a relation that is 
completely divorced from any equation or expression. The central 
idea conveyed by this more general meaning is that of a correspond¬ 
ence between values of y and values of x. The following definition is 
the result of a development over a long period and its formulation is 
due to Dirichlet, a famous French mathematician (1806-59). 

Definition. Let there be a set of values assumed by the inde-^ 
pendent variable x. If to each x in the set, there corresponds one or 
more values of y, then y is said to be a function of x in the set 

It should be observed that this definition ^ is freed from any notion 
of the necessity of specifying the mathematical relation between x 
and y. We may or may not know the special method by which the 
correspondence is set up. A mathematical formula or equation be¬ 
tween X and y may not even exist. A function may thus be 
considered as being equivalent to a table in which one may look up 
any x of the set of the definition, and find the corresponding y. 
Much of the data in statistics comes under this general definition 
of function. Thus, in the following table, net earning is a function 


of the year, whether or not there 
is any equation defining that 
functional relationship. 

Here the function is defined 
only for the indicated points 
which correspond to the values 
given in the table. The straight 
lines are drawn to help the 
reader visualize the relative posi¬ 



tions of these values and not to represent the function at inter¬ 


mediate points. They may, however, be thought of as a first 


classical example is the function which is defined for the infinite set of 
numbers from x = 0tox=sltobe unity for all rational numbers and zero for 
all irrational numbers. 
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approximation to the unknown function between the given values. 
Such a representation of the function could not, of course, be 
assumed in the case of discrete variates because then the function 
is discontinuous and does not exist except for the given values. 

Referring again to the above definition, if there is only one value 
of y corresponding to each value of x then y is called a single-valued 
function of x) otherwise y is said to be a multiple-valued function of 
X. Child weight would be an example of a multiple-valued function 
of age, being different for different children. The weight of a par¬ 
ticular child would be a single-valued function of age. For the most 
part we shall be concerned with single-valued functions. 

2. Charts. A detailed study of the technique of representing data 
by broken lines, by charts or bar graphs, etc., will not be undertaken 
here. It is a rather specialized and non-mathematical subject, and 
the student interested in plain-scale cartography can readily find 
books on the subject which are very readable.^ (A discussion of 
ratio charts is given in Chapter VII.) 
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I Frequency Polygon for the Distribution of Table 10 
n Frequency Polygon for the Distribution of Table 11 


Fig. 1 — Frequency Polygons for Distributions op Discrete Variates 

3. Frequency Polygon. We present now a discussion of the 
graphs that are used in connection with frequency distributions. A 
^ For example, 

(o) Graphs: How to Make and Use Them — H. Arkin and R. Colton. 2nd ed. 
Harper. 

(5) Engineering and ScierUific Graphs for Pvblicaiion, American Standards 
Association, New York. 

(c) Reference 8 in our Introduction. 




Sec. 5 


Frequency Curves 


25 


distribution of discrete variates may be represented graphically by 
plotting the points ( 2 : 2 ,/ 2 ), • • • (xkyfk), and drawing a broken 

line through them. Such a graph is called a frequency polygon be¬ 
cause it is a polygon formed by connecting the tops of a series of 
ordinates whose lengths are proportional to the various frequencies 
and whose abscissas correspond to the variate values of the distri¬ 
bution. Figure 1 will serve as an illustration. For a table of dis¬ 
crete variates the function exists only for the given values. Like¬ 
wise, its graph is discontinuous. The straight lines connecting the 
points serve merely to “ carry the eye,’' thus giving a better idea of 
the shape and position of the distribution. 



Fig. '2 — Histogram for Table 6 


4. Histogram. If the frequency distribution is one of grouped 
variates (discrete or continuous) it is better to use some form of \ 
graphical representation which recognizes the fact that the several / 
^measurements in a table do not lie precisely at the class marks but' 
are spread out over the intervals of which the class marks are centers. 

’ This may be accomplished through the use of a histogram, A histo¬ 
gram is a series of rectangles erected at the class boundaries with 
altitudes proportional to the respective class frequencies, and cen¬ 
tered on the class marks. Thus the frequencies ar e represen tedj^ 
^reas. ^ (See Figure 2.) If the bases are all of unit length then the 
altitudes are also equal to the frequencies. The histogram is an \ 
important and useful graphical device for representing frequency I 
distributions. 

6. Frequency Curves. The shape of the distribution may be 
emphasized by constructing a continuous frequency curve such that 
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the areas under the curve b^een the ordinates at the upper and 
lower boundaries of the various rectangles will equal approximately 
the areas of the corresponding rectangles. Thus, in Figure 3, the 
area of all the rectangles represents the total frequency 1000, and the 
area of the three rectangles labeled A, B, C represents the number 
of individuals weighing between 139.75 pounds and 169.75 pounds. 
The dotted line represents roughly the frequency curve correspond¬ 
ing to the histogram. 

Representing each class frequency of a distribution of continuous 
variates by a rectangle is equivalent to saying that we realize that 



Fig. 3 — Histogram and Frequency Curve for a Distribution 
OF Continuous Variates 


the function exists for points other than the class marks, but we do 
not know what it is for these points, and so as a first approximation 
we assume that the variates are uniformly distributed over each 
interval, which is equivalent to regarding them as concentrated at the 
class marks. If the class intervals were made smaller and smaller 
and at the same time the number of variates were proportionally 
increased, the upper bases of the rectangles would approach more 
and more the frequency curve which represents the ideal or theoreti¬ 
cal mathematical function relating frequency with variate value 
for the given distribution. 

A frequency curve is often drawn for convenience in describing 
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the properties of an observed distribution, although strictly speak¬ 
ing, the concept of a frequency curve is applicable only to an infinite 
i universe of continuous variates. The data at hand are supposed 
to be a sample ” from the universe represented by the frequency 
curve. 

The more common types of distributions may be represented by 
bell-sh aped curves w hich are either symmetrical or skew. For ele¬ 
mentary purposes it is sufficienTto consider frSjuenc^distributions 
as of these two general types. In passing, we may also mention two 
other types which are known as J-shaped and U -shaped. For ex¬ 
amples of these types see Yule and Kendall, An Introduction, io the 
Theory of Statistics, Ch. VI. 




Fig. 4 — Ogive for Table 7 


6. Ogives. The graphs of cumulative frequencies are called 
ogives. The ogive for Table 7 is shown in Figure 4 and is constructed 
by plotting the points (54.5, 0), (64.5, 3), etc., as in algebra, and 
joining them with straight lines. 

The student should observe that while cum / is a function of x it is 
defined for the end-x values only. Occasions will perhaps arise when 
we desire the x-value corresponding to some intermediate cum f 
value, say 453 in Figure 4. Conversely, we might wish to know the 
cum f value for some intermediate a:-value, say at x = 97. Strictly 
speaking, we do not know the answer in either case, inasmuch as we 
do not know how the IQ's are distributed over the interval. Per¬ 
haps all the individual values in the interval 94.5-104.5 (say) are 
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less than 97; perhaps none are. The fairest assumption we can make 
is that they are uniformly distributed throughout the interval. This 
means graphically that we represent cum f over each interval by a 
straight line, as is done in Figure 4. We may now interpolate under 
this line for intermediate values. This is “ straight line interpola¬ 
tion ” and is what the student uses when he interpolates in logarithms. 

More refined methods exist for interpolating values of a function 
between the observed values but their study constitutes a separate 
branch of mathematics beyond the scope of this course. It should 
be observed that the straight line used here is a first approximation 
to the unknown function, and not merely a device to carry the eye 
as in the case of a frequency polygon for a discontinuous distribution 
of discrete variates. 

7. Relation of Cumf to Areas. The sum of the frequencies (cumf) 
up to any value of x means, graphically, the sum of the areas of 
the rectangles of the histogram up to that value. Thus in Figure 4, 
the ordinate erected at a: = 84.5 represents the sum of the frequencies 
(3 21 -f 78) = 102 (Figure 2). If a frequency curve represents 

the distribution, then cum /, corresponding to any value of x, is the 
area under the curve up to that value. Thus, in Figure 3, cum f 
corresponding to a: = 139.73 is approximately the area under the 
smooth curve up to a; = 139.75, and the total area under the curve 
is cum f = N. 


Exercises 

1. (a) If/(*) = 2x’exhibit/(—*). Give the value of/(3), of/(—2). 

(6) Let f(x) denote a given function which is defined for all real values of 
X under consideration so that if c is any admissible number /(r) is 
defined. What is the graphical meaning of /(c)? 

2. If 4>(x) = LCe"**, (o) show that <t>(,x) = 4>(—x); (6) give the value of 0(0). 

3. If h(x) = o** + bx + c, and h{x) = h(—x), show that 6 = 0. 

4. If f(,x) = o*, show that f{u) X /(») = f(u + »)• 

6. If g(x) = log{(l — a:)/(l + a:)}, show that g{u) + gip) = 

»{(« + »)/(!-t-i«))j. ^ 

6. Make a histogram lor the data of Table 4. 

7. Same as exercise 6 for Table 8 or 9. 

8 . Construct an ogive for the cumulative frequencies given in Table 12. 

9. Find the cumulative frequencies and construct the ogive for Table 9. 

10* For further discussion of ogive curves and their uses, read the following 
references: 

(а) Elements of Statistics — Davis and Nelson, pp. 23-28. 

(б) The Mathematics of Statistics — Burgess, pp. 61-72. 



CHAPTER III 
AVERAGES 


1. It was pointed out in Chapter I that classification of the vari¬ 
ates of any long series is the first step necessary to overcome the 
confusion of detail in the original observations, and to make compari¬ 
sons with other distributions possible. In Chapter II graphical 
methods were studied which describe, to some extent, the shape and 
position of the distribution. Although these methods are helpful, 
their contribution is largely qualitative. 

It is desirable to formulate quantitative descriptions for character¬ 
izing a distribution, and as an aid in this direction averages are very 
useful. They are also called measures of location. A n average i s a 
quantity locating a central value of the distribution . In a sense, 
it is a typical value ol the wTiole set of variates, although it is not 
necessary that it actually have the value of one of the items of the 
set it represents. > There are five averages in common use. These 
are; arithmetic mean, mode, median, geometric mean, and harmonic 
mean. The means and median are most frequently used although 
the arithmetic mean is by far the most important in general statis¬ 
tical work, and the others are of service in special cases. We w'ill 
consider them in the order named. First, however, it will be desir¬ 
able to discuss certain symbols and notation which will facilitate the 
development of formulas. 

2. Notation. If x denotes a variable, then X\, X 2 , • • *, Xn, are 
general symbols for the values which x may take. When we are con¬ 
cerned with a sum like the following, 

aJi + X2 + 2:3 -[- *4 + ' • • + + • • • + Xjv, 

it is customary to designate it by placing the Greek capital letter ^ 
(agma) before the general term, thus 

N 

= Xi + X 2 -|- • • • +Xi + • * • + Xiv. 

The symbol ^ is a sort of mathematical verb and the notation 
written above and below it may be called adverbs. Mathematicians 
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call 21 operator and speak of the “ adverbs ” as limits. When 

It 

JT*. is placed before any quantity, it means, “ add up all quantities 

like • • • which are formed by giving i the values of every positive inte¬ 
ger from i = 1 to i = AT, inclusive.” Thus if Xi stands for “ variates ” 
in. Table 1, xi refers to the first value 75, X 2 refers to the second value 

80, etc., and xn refers to the last value Here N = 100. Hence 
100 

the compact notation ^Xi denotes the sum of all the variates in 

»=i 

AT 

Table 1. The ssrmbol is read, “ the summation of x-sub-i, i 

»=i 

varying (or running) from one to N.” The subscript i is called the 
index of summation. Any letter may be used as an index but it is 
conventional to use i or j. Also the upper limit may be denoted by 
any letter but we shall use N to denote the total number of variates 
(some of which may be alike) in a set. 

If a variable x is to take on the particular values, 1, 2, 3, etc., 
instead of the general values Xi, Xs, xs, etc., then x itself becomes the 
index of summation and we write x = 1 underneath Thus 

If 

® =1 
N 

£ x 2 = 1+ 2^+ s^ + --- + m. 

Frequently the index of summation is understood from the context 
and the notation at the'top and bottom of may be omitted if no 
ambiguity results. 

It is imperative that the student master, as soon as possible, the 
significance and utility of the ^ notation. , / 

Illustrations: 

N 

1. = 3xi + 3x2 -f- 3xjf 

•■■1 

= 3(a:i + 0^2 + • • • + xjf). 

2. ^(xi + c) = (xi + c) + (a!; 2 + c) + (xz + c) 

‘ + (X 4 + c) + (xs + c) 

= (xi + X 2 + Xz + xa + Xs) + 5c. 

4 

3 . ^Xifi = Xj/i + Xzfz + Xzfz + X4/4. 


V 




4" — *1^1 + *2^2 + ®8j/3 + Xil/t. 

. '’\—- _ "O-e 

6. 2;^= P'+2* + 3* + • • • + iV*. 


U = 1 

The following simple theorems will be useful in our work, 
y. Theorem I. The summation ^ of an algebraic sum of two or more 
terms is the same algebraic sum of the ^^s of these terms taken separately. 
In symbols: 

N iV JST iNT 

SCxi + yi- Zx) = + ^Vi - 

1=1 *=1 t=l »=1 


\ Theorem II. A constant factor may be removed from under the 
summation sign ^nd written outside as a factor. Thus, 




N N 

^CXi = C^Xi. 
i = l V » = 1 


Proofs: It is left as an exercise for the student to prove these two 
theorems by expanding the expressions. 

N 

Theorem III. If the expression under ^ is a constant c, the expanded 

<«i - . . 

result is Nc. 


Examples: 

N 

1. = c4-c + «*-+c = Nc. 

1=1 

N V V 

2. — c) = ^Xi — by Theorem I 

i = l 1 = 1 1 = 1 V 

N 

= ^Xi — Nc, by Theorem III. 
i = l 

The above theorems hold also if we replace the notation 


2a:< by '^x, etc. 

i = l a; = l 

The next two theorems have to do with summing integers. The 
numbers used in counting, 

1, 2, 3, 4, 5, •.. 

are called integers or natural numbers. 



32 


Averages 


m 


Theorem IV. 

y 


In symbols: 


The sum of the first N integers is 
N{N + 1) 


^ N(N + 1) 

2-(® “ o 

a-l ^ 


.--f 


This result follows from the fact that the integers form an arithmetic 
progression. 

Theorem V. The sum of the squares of the first N integers is 
N(N + 1)(2N + 1) 

I " ■ • 

6 


In symbols: 


iV(iV+l)(2iV'+1) 
2-* =- 


Proof: ’ Let us take the identity a:® — (« — 1)* = 3a:® — 3a: + 1, 
and sum each side for a: = 1 to N. Thug, 

f;[a:® - (x - 1)®] = f;[35® - 3x + 1]. 

X=»l " 05=1 


Applying Theorems I-III to the right member we have 

23[a:* — (® — 1)®] = 32®® — 32® + 

05*1 ' *=1 ®=sl 

Performing the indicated sum in the left member, we have 



Therefore iV» = 32®® - 32® + N. v ' 

Hence, using Theorem IV and simplifying, * 

, 2V» + ZN(N + J) - 2N 
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whence 


NiN + l)(2iNr + 1) 

2_ a; - - ; . 


3. Arithmetic Mean. The arithmetic mean of a set of variates is 
defined as ihe sum"81 the variates divided by their number. We are 
thinking now of a set of Ungroubed variates, like that of Table 1. If 
we use the symbol $ to’ represent the aiithmetic mean of the'iV 
variates xi, Xt, Xs • • xn, then 


» = — (a:i + *2 + 0:3 H-1- Xn ), 


or using the more compact notation of the preceding section, we have 


( 1 ) 


5 = 


1 ^ 


Each item in the set is thus represented in the arithmetic mean in 
proportion to its magnitude. 

As an illustration, it is easily verified that for the set of grades given 
in Table 1, 


7267 

X =- 

100 


72.67. 


Computing the mean ^ strictly according to definition (1) may be 
called the serial method to distinguish it from other methods which 
will be presented. This definition is applicable when N is so small 
that a grouping of the variates into a frequency distribution is not 
feasible. 

If X refers to the integens from 1 to iV their mean is 

(U) * - i & 

4. Weighted Arithmetic Mean. It will be noticed that several of 
the grades given in Table I are alike. For example, 80 occurs seven 
times. It should be evident that the same Result would be found for 
the mean if, instead of summing the individual values, each value was 
first multiplied by the frequency with which it occurs and all such 
products were then added. In general, if the values xi, X 2 , • • •, Xk 
occur with corresponding frequencies /i, / 2 , • • •, A, respectively, 

' When there is no ambiguity, the arithmetic mean is often referred to as the 
mean. - 
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where/i +/a + "•+/* = it follows that 

_ ^i/i ^a/a + • • • 4~ Xhfk 
/ 1 +/ 2 +•••+/* ’ 

or, in shorter notation 

(2) * = where N == 

JN 1 1 

When obtained in this way, x is generally called a weighted arith¬ 
metic mean. The^term originated in experimental science where 
some readings which have been made under more favorable conditions 
are “ weighted according to their reliability or importance. When 
the weights have been chosen, they become, essentially, frequencies. 

If the are added individually, the /'s become unity, and equa¬ 
tion (2) reduces to (1). The student should notice that, for the 

k N 

same data, 13 m is numerically equal to He should also 

1 ,v 1 

observe that N refers to the number of variates in the set (some of 
which may be alike), whereas k refers to the number of different values 
of X in the set and hence to the number of products of the form xifi 
where/* is the number of times Xi occurs. In the following example, 
iV = 8 and fc = 4. • ^ 

0 

Example, For the values 6, 8, 7, 6, 5, 7, 6, 5, 

8 

zi + X 2 + Xz + Xi X6 Xi + xt Xi = 6 + 8 + 7 + 6 + 5 + 7 
+ 6 + 5 = 50. 

4 4 

tifiXi = fixi + /2X2 + fiXi + /4X4 = 2*5 + 3-6 + 2-7 + 1-8 = 50. D/i = 8- 

i=l 

By either method, 2 = 50/8 = 6.25. 


Exercises 


1. Write in expanded form: 

(af J^Xifi; (b) Y.Xi^i; 


2. Write in expanded form: 

ni 

(o) E/.; 

1 


ni+ni 

ib) E fa 

»a*nj+l 


(c) Zixi-m. 


»»1 


ni m+nt 

(c) 'Exifi + E »</<• 

» = 1 t'Bni+l 


»• 3. Express 2(c) as a single summation, if ni + na = k. 
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4. Write in the abbreviated form, using X) • 

(O) Xifi + X2f2 + • • • + Xkfk- 

(b) {xi — 5)/i (x 2 — i )/2 + •••“{" (xk ~ x)fk» 

(c) ~ Kxi - mi + (X2 - m 2 + -- + (xk- mk]. 

*^ 5 . Prove: 

k k k 

(a) Efe + lyfi = + 2T.Xifi + N. 

1 1 1 

(6) - l)p = Y,x(x - l)p. 

x=0 x=2 

6. Compute the value of exercise 1(c) for the example in §4, using the following 
form: 

Xi fi {Xi - x) (Xi - x)fi 

5 2 -1.25 -2.50 

6 3 

7 2? ? 

8 1 

Eft. - m = ? 

7. Distinguish between ^Xi^i and ( ^Xi ) ( 5^2/* ). Write in expanded form. 

t = l \i = l/\t=sl/ 

i| 8, (a) Express in ^ notation: Each different variate is multiplied by its own 
jf and the sum of the results is divided by N. 

(b) Give word statements of the expressions in Exercise 4. 

(c) Express the general polynomial of degree n in x, 

Oq + aiX + 02X2 . -J- 

in ^ notation. 

9. Using the identity 

x2 - (x - 1)2 = 2x - r 

derive the result 

^ N{N + 1) 

-2- 

x = l ^ 

by a method analogous to the proof of Theorem V. 

10. (a) Express in abbreviated notation: The sum of the squares of the x's 
divided by the square of their sum. 

(6) If X refers to the integers from 1 to AT, evaluate your answer to (o) in 
terms of AT. 

(c) Show that the mean of the first AT integers is (AT + l)/2. 

6. Arithmetic Mean from Frequency Table. The variates in each 
class interval of a frequency distribution are assumed to have the 
value of the class mark for that interval. Therefqre, we may use 
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formula (2) to find the mean of a frequency distribution. In this 
case, Xi represents the mid-value of the ith class interval, /»the corre¬ 
sponding frequency, and k the number of intervals; i running from 1 
to k. The method of applying (2) is illustrated in Table 14 from the 
data of Table 2. 


Tablb 14 


Clasa 

Interval 

Class Mark 

X 

Frequency 

/ 

Product 

fx 

30-59 

34.5 

2 

69.0 

40-49 

44.5 

3 

133.5 

60-59 

54.5 

11 

599.5 . 

60-69 

64.5 

20 

1290.0 

70-79 

74.5 

32 

2384.0 

80-89 

84.5 

25 

2112.5 

90-99 

94.5 

7 

661.5 

Totals 


E/=ioo 

1 

YJx = 7250.0 


7250 

100 


72.50. 


If we denote the class interval by c then it is obvious that c « 10 in Table 14. 


In this connection it is interesting to note that our result here 
differs very little from the true value 72.67 and therefore our assump¬ 
tion that all values in a given class may be tak^n as the class mark 
seems to cause little error in the result obtained for the mean. This 
can be proved mathematically (under certain assumptions) and will 
be referred to later. 

6. Translation of Axes; Deviations. It is frequently useful to 
employ the methods and results of geometry in connection with the 
problems of statistics. Foremost among these methods is the repre¬ 
sentation of numbers by points on a line; an origin and a unit of 
measure having been chosen, a coordinate is assigned to each point on 
the line. When a frequency distribution is represented by a graph, 
we have seen in Chapter II that the variate values are used as abscis¬ 
sas or measurements along the a;-axis. The mean is therefore the 
point on the x-axis whose coordinates are (x, 0). Its position may 
be emphasized by drawing a vertical line through this point, but it is 
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the horizontal distance of the point from the origin and not the 
vertical line which represents graphically the mean. 

In discussing the variates we mdy often work with smaller numbers 
by changing the origin of reference. If liew axes, are taken 
parallel to the old axes, xy^ with positive direcitions preserved, the 
axes are said to be translated from one position to the other. A trans¬ 
lation of axes corresponds to a transformation of coordinates. Thus if 
we let 

x' = X - Xq, y' == y - yo 

the origin is translated to the point (xoy yo). Since the variates are 
denoted by x we are concerned here only with the transformation 
x' = X — Xq which translates the origin to the point (xo, 0). The 
variates referred to a new ‘origin are often called deviations. In 
particular if we translate the origin to the mean by letting 

x' = X — X, 

then for a frequency distribution the deviations are the values 
obtained by subtracting x from each of the class marks. Thus, 

Xi = Xi — X. 

X2 = X2 — X 


Xk = X* — X. 

The units of measurement remain unchanged. Figure 5 shows the 
two systems when the axes arc translated to (x, 0). Obviously, any 



variates that are larger than x will be positive in terms of x' and any 
variates smaller than x will be negative in terms of x'. 
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7. Properties of 3E. There are two important properties of x 
which may be stated in the following theorems: 

Theorem VI. The algebraic^ sum of the deviations of all the variates 
Jrom their arithmetic mean is zero. 

J Proof: Let represent a deviation from the mean. Multiplying 
each different deviation by the number of times it occurs and adding 
these products we have, 

2/ia;/ = - x) 

1 1 

h k 

= by Theorem I 

1 1 

k k 

— by Theorem II. 

1 1 

k k 

Recalling from (2) that ^fiXi = Nxj and that JI/* = we have 

1 1 

(3) = 0. 

1 

Theorem VII. If the variates are referred to a new origin Xo and 
expressed in units of c by means of the transformation ' 

(4) « = 

c 

then the old meaUy x, is related to the new mean, u, by the following 
formula: 

(6) x = cu + Xo. ^ 

Proo/: From (4), 

(4o) ■ X = cu + Xo, 

and substitution of this value for x in definition (2) gives 

I * 

X = — + »o). 

IS \ 

By Theorems I and II this equals <» 



1 That is, taking account of sign^^ Some the deviations will be positive and 
some negative. 
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But the first of these expressions is, by definition, c times the mean 
value of M, and the second is, from (2), simply xo. Therefore 


X = CU + Xo. 


This is an important relation and its derivation should be mastered. 
Observe that the size of a w-unit will be c times as large as the size 
of an a:-unit. 

/ Corollary. If the mean of the deviations of the variates from any 
arbitrary number, xo, is found and added algebraically to Xo, the result is 
/the mean x. In symbols^ ’' . * 

1 . 

(6) ^ = Tt - Xo) +Xo. , t ' ^ 

iV 1 


The proof follows from (4) and (5). 

In (5) and (6), xo may be regarded as a provisional mean, and the 
first term in the right members may be thought of as the correction 
to be added algebraically to the provisional mean in order to get the 
true mean. 

8. Short Methods of Computing 3c. In certain cases, the method 
of computing the mean by (2), as shown in Table 14, can be simpli¬ 
fied by use of Theorem VII. 

Case I (class intervals equal). If the class marks are equispaced, 
let c equal the (ilass interval and choose Xo as one of the class marks, 
usually the one opposite the largest frequency. From (4), Xo becomes 
the origin of w, because when x = xo, w = 0. 

The method of using (5) is illustrated in Table 15, page 40. Here 
c = 10 and we choose Xo = 74.5, so (4) becomes 


u = 


X - 74.5 
10 


Substituting the given values of x in this relation we get the values in 
the u column. So in running the fu column, small values of u are 
multipliers of the larger values of /. Then 


^ 1 -20 _ 

“ “ 305 ^^^“ ■ loo ■ 


SO from (6), 


f = 10(-.2) + 74.6 = 72.5%. 
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It should be evident that the final value obtained for the mean is 
independent of the choice of the arbitrary value Xo. This choice is 
only a rough guess and it is really immaterial whicli of the given 
values is selected as Xo, except that the nearer it is to the mean the 
lighter will be the calculations to follow. A check on the arithmetic 
may, therefore, be effected by selecting a different provisional mean. 


Table 15 — Mean op 100 Grades Using Class Interval as Unit 


X 

u 

/ 

fu 

34.5 

-4 

2 

- 8 

44.5 

-3 

3 

- 9 

54.5 

-2 

11 

-22 

64.5 

-1 

20 

-20 

74.5 

0 

32 

0 

84.5 

1 

25 

25 

94.5 

2 

7 

14 

Totals 


100 

-20 


This indirect method is sometimes called coding because the vari¬ 
ates are coded to another scale in which it is easier to compute the 
mean. Fogpula (fi) is the rela tion, t hen, for transforming the mean 
from one scale to another. ^ 

If one^s statistical interests are limited to computing means, then 
(2) cannot be improved upon if calculating machines are to be used. 
It should be understood, however, that techniques must be devel¬ 
oped now for subsequent purposes. The indirect method is part of 
a pattern which is useful in later chapters. From this standpoint, 

k 

one should practice using it at this stage when N = ^fi is large 
and the x’s are equispaced. • 

Case II (doss intervals unequal). Occasionally a frequency dis¬ 
tribution is encountered in which the variates are not equispaced; 
it is then usually best to take c = 1 (unless the x’s have a common 
factor c) and be content with whatever simplification results from a 
suitable choice of Xo. This is equivalent to using the above corollary. 

In Table 16, we choose xo = 200 and* are thus able to simplify the 
work a little. (See page 41.) 
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Table 16 


X 

f 


uf 

106.12 

7 

- 93.88 

- 657.16 

191.83 

14 

- 8.17 

- 114.38 

246.48 

32 

46.48 

1487.36 

283.63 

49 

83.63 

4097.87 

25J.65 

55 

57.65 

3170.75 

294.51 

54 

94.51 

5103.54 

222.53 

35 

22.53 

788.55 

71.43 

14 

-128.57 

-1799.98 

Totals 

260 


12076.55 


a = ^ZfiUi = - 200) 


12076.55 

260 


= 46.448 


5 = S + xo = 246.45; 


9. Geometric Explanation. Let us consider further the relation 
between the variables x and u, defined by the expression 


(4) 


u = 


X — Xo 




■4- 


Xo 


X~Xo 


A geometric explanation will be 
helpful. 

Graphically, the x values are 
distances along the x-axis meas¬ 
ured from zero as origin. Like¬ 
wise Xo is some point on the , 
x-axis at a distance of xo units 
from zero. If now the points 
representing the x values are 
measured from Xo as origin 
they are denoted by x ~ xo. 

(See Figure 6.) Thus if Xo .= 24, a value which is 36 with reference 
to the origin of x will be 12 with reference to Xo; likewise a value 
X = 18 becomes x — xo = —6 when referred to xo as origin. It 


Fig. 6 
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should be noted that a; — xo is in the same units as x. Thus if x is in 
inches, x — xo will also be in inches. But (x — Xo)/l2 would then 
be in feet. Instead of dividing by 12 suppose we divide by c. Then 
(x — Xo)/c will be in units of c whatever c may be. It is convenient 
to denote the resulting values by a different letter, say w. There¬ 
fore the numerator of (4) changes the origin of reference but does 
not affect the scale of measurement. The denominator changes the 
scale, there being c of the x units in one of the u units. Relation (4) 
has this generalized meaning apart from statistics. Mathematical 
notation is applicable to many different fields of knowledge. A rela¬ 
tion like (4) which occurs in physics is C = (5/9) (F — 32); it con¬ 
nects temperature on the Centigrade and Fahrenheit scales. 

When (4) is applied to a frequency distribution it is convenient to 
select Xq as one of the mid-x values and to take c as the width of the 

class intervals. Under Case I, the 
mean is found with reference to Xq 
and in units of c. This is the mean, 
ie, of the numbers representing the 
various class intervals weighted 
with the corresponding frequencies. 
After this mean is computed it may 
be converted back into units of x 
by multiplying by c, and then re¬ 
ferred to the origin of x by adding Xq, (See Figure 7.) Hence we 
have 5 = ca -f xq. Thus we arrive at the same result as that 
obtained algebraically. 

If we had denoted the variates by y we could have used the relation 


Xo 


u in units of C 
/■-^ 

—I-1- 


cu in original units 
:->• 


Fig. 7 


— If Xo < cu is positive; if 
Xo > cw is negative. 


v = y-^ 

c 

corresponding to (4). Geometrically, this would mean a change of 
units and a translation of origin in the y-direction. The relation 
corresponding to (5) would then be 

y = cv + yo 

where ^ ^ 

As the short-cut method is an important one, another illus¬ 
tration is given in Table 17 (based on Table 4). Here we take 
u = (x- 2.745)/0.5 = 2(x - 2.746). 
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Table 17 — Computation of Mean Monthly Rainfall at Iowa Citt 

189(>-1925 


X 

/ 

u ^ 

fu 

0.245 

23 

-5 

-115 

0.745 

42 

^4 

-168 

1.245 

58 

-3 

-174 

1.745 

62 

-2 

-124 

2.245 

49 

-1 

- 49 

2.745^-a:o 

47 

0 

0 

3.245 

32 

1 

32 

3.745 

27 

2 

54 

4.245 

18 

3 

54 

4.745 

15 

4 

60 

6.245 

14 

5 

70 

5.745 

7 

6 

42 

6.245 

10 

7 

70 

6.745 

5 

8 

40 

7.245 

6 

9 

54 

7.745 

5 

10 

50 

8.245 

3 

11 

33 

8.745 

2 

12 

24 

9.245 

5 

13 

65 

9.745 

0 

14 

0 

10.245 

1 

15 

15 

10.745 

1 

16 

16 

Totals 

432 


49 


2 = 2.745 + 

(0.5) (49) 

432 

• 


= 2.802 inches. 



10. Mean of Means. So far we have used subscripts to distin¬ 
guish between the variates within a set: Xi, X 2 , * • *, xn* By this time 
the student should be thinking easily in this notation so we may now 
state an additional use of subscripts. Instead of using x and y to 
distinguish between two sets of variates we may use xi and x^. Then 
to distinguish the variates within a set we would add a second sub¬ 
script, so for the xi set the variates are 


Xi2, Xi3, • • •, 
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and for the xi set the variates are 
X2I, X22, X23, 


I ^2n2» 


These are read “ x two one,” etc., not x twenty-one,” etc. In the 
notation dealing with one set, x was a^variable but Xi, X 2 , etc., were 
constants. Now Xi and X 2 are variables and Xn, 0 : 12 , • • •, 3 : 21 , 0 : 22 , • • •, 
etc., are constants. Thus Xi and X 2 may denote the grades of two 
sections of mathematics in which there are ni and /I 2 students respec¬ 
tively. Then the mean of the first set is 


(o) 


J m 


and the mean of the second set is 


( 6 ) 


X2 


n2i=i 


. c 


We will now state a useful theorem. 


V ’ Theorem VIII. If the mean of a set of ui variates is xi and the mean 
of another set of W 2 variates is f 2 , the mean ^ of the combined sets is 


(7) 


X = 


Hijc i + n2X% 

N ’ 


where N = ni + n 2 - 

Proof: It is obvious from equations (a) and (6) that 


ic) 


UiXl + n^2 = ^Xii + 2 ^ 21 . 

1 1 


If X is allowed to stand for Xi and X 2 in succession as shown in the 
table on page 45 then the right member of (c) may be written 

ni+na 

2^ Xi which denotes the sum of all the variates when they are 
1 

combined into one set. If this latter sum is divided by the total num¬ 
ber of variates N the result is, by definition, their mean. Hence 




n^l 712^2 


ni nt 

1 1 


ni+na 


N 


• = X. 


m 4” 712 4” ^2 

We may express (7) in more compact notation as follows. 

/ 12 2 
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1 

Xu 


Xl 


2 

Xi2 


Xi 


3 

Xit 


Xs 


• 

• 


Xl 

• 


ni 

Xlni 


Xni 






1 ® 

ni + 1 

X2l 


Xfii+l 


ni +2 

X22 


Xni+2 


Til + 3 

X2S 


Xni+S 


• 

: 

>X2 

. 


fli 4“ 712 

X2ti2 , 





ni 

na 

ni +na 


4- 1^X2, 

Xi 


»=1 

» = 1 

i=l 


^ Tliis form lends itself to a generalization for k sets so we have the 
following theorem. 

Theorem IX. The mean of a set of N variates which is composed of k 
subsets is 

(8) X = 

where Xi is the mean and n* is the frequency in the fth subset and 

N = in,. 

i = l 

Corollary. If n, = n is the same for all the sets, then N = kn and 

(8) reduces to 

1 * 

(9) X = - Ex,-. 

k 1 

Exercises 


1. (a) Use (1), §3, to find the mean of the following numbers: 18, 42, 23, 16, 


^3. 

>yi. 


103, 61, 49, 95, 113, 10. 

(6) For the numbers in (a) verify that the sum of their deviations from their 
mean is zero. What theorem does this exercise illustrate? 

Find the deviations of the numbers in Ex. 1 from 50 and verify that the mean 
of these deviations added algebraically to 50 gives the mean of the numbers 
themselves. 

Prove: The sum of the deviations of the variates from their mean is zero. 

Derive the relation x = cw + xo- 
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5. Find the arithmetic mean of the weights of 1000 students given in Table 12. 

Use (5). Ans. 138.65 lbs. 

6 . Find the mean monthly rainfall at Des Moines from 1890 to 1925, using the 

frequency distribution which you previously made. Ans, 2.55 inches. 

7. Find the mean of the distribution of discrete variates given in Table 11. 
Prove the following theorem: The mean of a set of variates is unchanged if 

vy each variate is replaced by the mean of all the variates, 

9. (a) Prove expressions (8) and (9). 

(6) The mean grade of one class of 20 students is 76% and of another class of 
15 students is 80%. Find the mean of the two classes. 

10. The record of freshman scholastic averages for a semester at a certain uni¬ 
versity were given as follows: 



m 

Xi 

Men 

501 

3.550 

Women 

356 

3.639 


Find the mean grade for the entire class. 

11. Assume that the following fictitious data represent, the earnings per week of a 
certain type of machine shop labor in Illinois establishments: 


Wage Group 

F^requency 

$00.0 

under $10.0 

50 

10.0 

20.0 

150 

20.0 

30.0 

400 

30.0 

40.0 

200 

40.0 

50.0 

160 


* 

♦ 

60.0 

80.0 

40 

Total 


1000 


•Class omitted. Note the different interval in the last class. 

The average earnings per week for this same type of labor in all other states of 
the United States where 9000 men are employed, not counting those in Illinois, are 
$30.00 per week. 

Compute the arithmetic mean wage (a) for Illinois, (6) for the entire United 
States. 

Recompute the mean wage for Illinois in such a manner as to check, in the 
quickest and surest way, the accuracy of the result found in (a) above. 

12. Find the mean of the following distribution: 


X 

/ 

47.5 

7 

48.1 

17 

45.9 

46 

44.0 

44 

40.7 

54 

41.6 

43 

38.0 

35 

33.2 

14 
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I 11. The Mode. Th^t value of the variable which occurs most 
I frequently is called the mode. Its chief service is in characterizing 
a type and it is the kind of average meant by such a phrase as the 
average man.” There is some difficulty in giving a precise defini¬ 
tion of the mode without more advanced mathematics. However, 
we may say that for a given grouping an approximate value, which 
we will call the empirical mode, is given by the class mark having the 
largest frequency.^ Thus, in Table 17 the empirical mode is 1.745 
inches. 

12. The Median. Instead of finding the mean, suppose the N 
variates are arranged in the order of their magnitude. The median is 
defined as the value which is greatt^r than half the variates and less 
than the other half. A more precise definition is as follows: 

Let Xi, X 2 y • • •, Xn be a set of real numbers, which may or may not 
be all different and suppose they are arranged in order of magnitude 
so that 

Xi ^ X2 S S ^ Xn- 


Whenever N is odd, A = 2fc — 1, the median is Xky the middle one of 
the If N is even, N = 2ky the median is not uniquely defined 
unless Xk = Xk^iy in which case the median is this common value. 
Otherwise, the definition is satisfied by any value of x belonging to 
the interval 

XkSx S Xk+l, 


and the median is to this extent indeterminate, 
conventional to take 


as the median. 




In this case it is 


Example. Find the median of the following set of numbers: 10, 6, 5, 25, 15, 18, 
20 . 

Arranging them in order (jf magnitude w^e find the median to be 15 (the mean is 
14.14). If we add another value, 37, to make N even, the median is ^(15 -f- 18) = 
16.5 (the mean is 17). 

13. Median of a Frequency Distribution. Case /. For a fre¬ 
quency distribution of continuous variates, the median is defined as 
follows: 

Definition: The median is the value of x for which cum f = N/2. 

Given such a frequency distribution we may therefore find its 

^ Another method of computing the mode will be given in a later section. 
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median by forming a cumulative frequency table and interpolating in 
the end-x column for the value of x corresponding to N/2, 

The method should be clear from the following illustration. 



Example, Find the median for the data of Table 2. 


Interval 

/ 

End-x 

Cumf 



29.5 

0 

30-39 

2 

39.5 

2 

40-49 

3 

49.5 

6 ‘ 

50-59 

11 

59.5 ^ 

16 

60-69 

20 

69.5 

36 

70-79 

32 

Md 

<-50 



79.5 

68 

80-89 

25 

89.5 

93 

90-99 

7 




\ 

99.6 

100 


Here, N/2 = 60. This value of cum f corresponds to a value of x 
in the interval 69.6-79,5. Therefore the median is 69.6 plus a frac¬ 
tion of the distance from 69,5 to 79.5. Thus, 


End-x 

Cumf 


- r69.5 

1 



*|_Median 

50f^ 

D2 


79.5 

68 J 



Assuming that the items in any class interval are uniformly distrib¬ 
uted over that interval, it follows that ^e partial differences are 
proportional to the total differences: di/Z?i = d 2 /Z> 2 . That is, 

Median — 69.§ 66 — 36 

79.5 - 69.5 68 - 36 
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whence, 


Median 10 



= 69.5 + 4.4 = 73.9. 


This is called “ straight line interpolation ” or “ interpolation by 
proportional parts.” The reason for these names is made clear in the 
following diagram. 



AABC is similar to AAED 


. AB BC 
* ' AE ” ED 


X = AB = 


AEBC 

ED 


_ 10(50 - 36) 
68-36 

■”© 

= 4.4 


Md = 69.5 + x = 73.9. 

The following formula may also be used to compute the median; 


— tXm “f" 


B-*]! 
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where c3?»n is the lower end-value of the median class, N is the total 
frequency, bfm the number of variates below the median class, c the 
class interval, and fc the frequency of the median class. 

Cdse II. In the case of a set of discrete variates there may be no 
value in the set such that the number of variates which are larger than 
it is equal to the number less than it. Thus in Table 11 the values of x 
arc integers and 35% of the throws yielded 5 or fewer successes and 
65% yielded 6 or more successes. Neither x = 5 nor x = 6, nor any 
integer, will exactly split the total frequency into two equal parts. 
Of course a formal application of the definition given in Case I will 
give a value of x for which cum f is N/2. The difficulty is not so 
much in the interpretation of the fractional result because the same 
objection could be cited against the mean. But the real difliculty 
lies in explaining interpolation in a discontinuous function. We 
cannot assume that the given frequencies arc distributed over the 
interval from one value of x to the next. Perhaps the best we can do 
in such cases is to make a statement similar to the one above for 
Table 11. At least such a statement serves to summarize the situa¬ 
tion without artificiality. 

14. Graphical Interpretation of Mean, Median, and Mode. The 

mean corresponds to the abscissa of the point known in mechanics as 
the centroid of area. If a thin, homogeneous plate of metal cut in 



Mean 


Fig. 9 

the shape of a histogram is supported loosely on a horizontal axis 
through its centroid, the plate will have no tendency to rotate, what¬ 
ever horizontal direction this axis may assume. 

The median of a frequency distribution is the abscissa of a point 
through which a vertical line will divide the total area of the histo¬ 
gram into two equal parts. 

If a distribution could be represented by a smooth curve, then the 
mode is the abscissa of the highest point on the curve. 

Figure 9 shows the position of the three averages in a moderately 
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skew distribution. If the distribution were perfectly symmetrical 
then all three of these measures of location would coincide. 

There is an interesting emjiirical relationship between the three quantities 
which appears to hold for unimodal curves of moderate asymmetry, namely, 

mean — mode = 3 (mean — median). 

It is a useful mnemonic to observe that the mean, median, and mode occur 
in the same order (or reverse order) as in the dictionary; and that the median is 
nearer to the mean than to the mode, just as the corresponding words are nearer 
together in the dictionary.^ 

16. Discussion. The student primarily interested in the use of 
these averages in practical statistics might reasonably inquire, 

Which of the three averages mentioned should be used in a given 
problem? The answer depends upon certain properties peculiar to 
each average and upon the nature of the data to be averaged. 

In most cases the mean is a distinctly superior average. It is 
rigorously defined, easily computed, and is most tractable in theoreti¬ 
cal discussions. 

When the median differs considerably from the mean it is likely 
that the median is the more typical value. The advantage of the 
median over the mean is recognized in at least three situations: 

y(a) When occasional and unexpected values occur at the ends of 
the distribution. In such cases the mean would tend to distort the 
true representation of the typical value, being unduly influenced by 
the exceptional values. 

y(b) When the data are presented in a table left open at one or both 
ends. For example, suppose the registrar's office of a university 
reports the following distribution of grades as given in all departments 
for a semester: 


Below 60 

60-69 

70-79 

80-89 

90-100 

215 

1060 

2217 

1242 

506 


A cum f table may be formed and hence the median can be found 
without any more information about the values less than 60. 

-^c) When the observations cannot be measured numerically but 
can be ordered. 

The mode is best adapted to situations where the word ** usual 
would be appropriate. Unless a large number of items are con- 

^ M. G. Kendall — The Advanced Theory of Statistics^ vol. I, p. 35. Lippincott. 
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sidered the mode can have little practical meaning. It is the appro¬ 
priate average in certain questions of marketing because manufac¬ 
turers are interested in the type or quality which is usually in demand. 
Or again, in an investigation concerning wages and cost of living, the 
mode would reflect the average situation. Also, in a mathematical 
treatment of frequency curves the concept of the mode is very useful. 

Sometimes a distribution has more than one mode, although this is 
usually due to heterogeneous material. In this course we will be 
concerned only with unimodal distributions. 

The above remarks about the appropriateness of various averages 
are made from the standpoint of describing and condensing the data 
per se. A few remarks from a different point of view should perhaps 
be added here. In the theory of sampling, which deals to a large 
extent with estimating from a sample certain constants in the parent 
universe, it is shown that the mean has definite advantages. The 
mean is much more efficient ^ than the median, for example, in esti¬ 
mating the corresponding average in the universe (except in a special 
case when the universe is an unusual type). 

For a more complete treatment of the applicability of these three 
averages, the student is referred to the following books: 

1. Theory of Statistics — Yule and Kendall, Ch. VII. 

2. The Mathematics of Statistics — Burgess, Ch. V. 

3. Mathematical Statistics — Camp, p. 40. 

Exercises 

1. State what the empirical mode is in each of Tables 8 to 13. 

2. Explain why the median is found from interpolating in the end-x column 

and not the mid-x column. 

3. Read one or more of the references in §15 and write an essay on the ad¬ 

vantages and limitations of the mean, median, and mode. 

4. Find the median IQ for the data in Table 7. 

5. Find the median for the data in Table 9. 

16. Geometric Mean. The geometric mean of a set of N 
positive values is the Ath root of their product. Thus, the geometric 
mean (G.M.) of two values is the square root of their product, of three 
values the cube root of their product, and in general for the N values 
2 / 1 , 2 / 2 , • • •, Vn, 

1 

(10) G.M. = [yi • y* • ys • • • 

1 See Economic Control of Manufactured Products — W. A. Shewhart, p. 280. 
D. Van Nostrand Co. 
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Equation (10) lends itself to the use of logarithms and frequently they 
greatly facilitate the computation of (J.M. From (10) we have 

(11) log G.M. = ^ [log 2/1 + log 2/2 H-1- log 2 /iv]. 

Therefore the arithmetic mean of the logarithms of a set of values 
is the same as the logarithm of the geometric mean of the values 
themselves. 

Examples: Find the geometric mean of 

(а) 3, 0, 12, 24, 48. 

Solution: 

G.M. = [(3fi)(2io)]i/6 = (3) (22) = 12. 

(б) 7.96, 13.82, 22.95, 35.34. 

Solution: 

log 7.96 = 0.90091 
log 13.82 = 1.14051 
log 22.95 = 1.36078 
log 35.34 = 1.54827 
4 |4.95047 
log G.M. = 1.23762 
G.M. = 17.28 

The geometric mean is the appropriate average when the data are 
limited at one end of the range and unlimited at the other, and there 
tends to be a constant rate of change from one y value to the next. 
This is characteristic of values which tend to form a geometric pro¬ 
gression, i.6., which tend to follow the simple exponential law 

(12) y = ar® 

The student will recall from algebra that a geometric progression can 
be put in the form 


X 

0 1 2 ••• a; 

y 

a ar ar- • • • ar^ 


The value of any term in the y series is a function of the exponent of r 
since a and r are constants. The functional relationship is therefore 
represented by (12). 

The growth of many quantities in nature follows this law and it is 
sometimes called the law of natural growth. With x referring to 
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time, y may represent, for example, the population of a city, the 
enrollment of a school, the weight of a quantity, or the number of 
bacteria in a culture. The accumulated amount S, of P dollars 
invested at i rate of interest, compounded periodically for n periods 
also takes the form of (12), namely, 

S = P(H- ^•)», 

where r is now (1 + i), a is P, and n and S are the variables 
corresponding to x and y. 

Thus, if $1000 increased at compound interest to $2150 in 31 years, 

$1 000 ^ ^ _ $215 0 

0 1 2 30 31 


the geometric average rate at which the money increased is found 
as follows 


r” = (1 + 


2150 

1000 


l+i= (2.15)i/»> 
= 1.025 
t = 21%. 


Since there was an increase of —— = 115%, the arithmetic average 

lUUU 

115 

would be — = 3.7% which is also the simple interest rate. 

If y in equation (12) represents population, and we are given two 
values of y corresponding to two dates N years apart, the geometric 
mean enables us to find a fairer estimate of the value of y at the mid 
date than would be given by any other average. For example, 
suppose we are given that the population of a city was 2500 in 1920 
and 5000 in 1930. We wish to estimate the population in 1925 and 
to find the average annual rate of increase. If we are given no other 
information, our best estimate for 1925 is given by 

G.M. = (yi ■ y^yi^ = (2500 X 5000)‘'2 = 3535. 

The average annual rate of increase is obtained by solving (12) for r as 
follows: 

5000 = 2600r“ 

2 = r“ 
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Hence r = “x/i = 1.0718 = 107.18%, so that the average annual 
rate of increase is 7.18%. It is now possible to estimate the popula¬ 
tion for any intermediate year. Thus, for 1928, we have from (12): 


y = 2500(1.0718)» = 4353. 


The geometric mean is also used in economics in averaging index 
numbers which are essentially the ratios of prices of commodities at 
one date to their prices at another date. In general it is the appropri¬ 
ate average when emphasis is on the rate or i^erceritage of change 
rather than the amount. 

17. Harmonic Mean. Another average which has long been 
known and which is required in certain problems is the harmonic mean 
(H.M.). For the N positive values ^ 2 , • • •, xn, it is defined as the 
reciprocal of the arithmetic mean of the reciprocals of the values. 
In symbols. 


(13) 


H.M. 



This measure is used in averaging ratios, such as rates and prices, when 
certain conditions are agreed upon. 

In the case of time rates, we have ratios between two quantities 
one of which is in units of time, which we will denote by t, and the 
other is in units of some element like distance or accomplishment or 
temperature, etc. Denote this second element, different from time, 
by d. Then we make the following observations: 

(а) A rate may be stated either in the form d/t or in the form i/d. 
Thus, a car which travels at the rate of 30 miles per hour may also be 
said to travel at the rate of 2 minutes per mile. In this illustration 
the second form is not the usual way of expressing the rate, but there 
are cases in which the form t/d is usual. When we say a man takes 
10 seconds to run 100 yards we are expressing his rate in time per 
unit of distance {t/d). 

(б) In averaging rates one should first decide whether d or t should 
properly be the basic or fixed ” element in the discussion. Occa¬ 
sionally there is a difference of opinion about which element should 
most appropriately be regarded as fixed. For example, suppose a 
class of students has been given 15 minutes in which to work as many 
as they can of a given list of problems, and the number of problems 
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worked correctly by each student recorded. Some educational 
statisticians would say that time should be the fixed element here and 
that number of problems solved (in a unit of time) should be the vari¬ 
able. Others would say that the number of minutes {t) a student 
required to work one problem is the proper variable and that 
a problem (d) should be regarded as the fixed element in the dis¬ 
cussion. 

In one case the rates are equally weighted in the sense of time 
and in the other case they are equally weighted in the sense of the 
element d. 

A (c) The harmonic mean of the rates expressed in the form d/t gives 
the same result as the arithmetic mean of the same rates expressed in 
the form t/d. This is evident from equation (13) if it is written in the 
form, 

H.M. N ^Xi 

and from the fact that rates in one form are merely the reciprocals of 
the same rates in the other form. 

As an illustration, let us consider three cars: 

A travels at the rate of 15 miles per hour (I mile per minute), 

I B travels at the rate of 20 miles per hour ( J- mile per minute), 

C travels at the rate of 30 miles per hour (§ mile per minute). 

But their rates could just as well have been stated as 

A travels at the rate of 4 minutes to the mile, 

II B travels at the rate of 3 minutes to the mile, 

C travels at the rate of 2 minutes to the mile. 

The harmonic mean of the rates as stated in I is 20 miles per hour; 
i.e,, ^ of a mile per minute, and the arithmetic mean of the rates as 
stated in II is 3 minutes pei* mile or again, 20 miles per hour. 
(Verify.) ✓ 

The third observation, t.c., (c) above, suggests the following discus¬ 
sion. The arithmetic mean of the rates in I is 21 f m.p.h. and this is 
the harmonic mean of the rates as stated in II. 

The question arises, which is the correct average, 20 m.p.h. or 21f 
m.p.h.? The problem is indeterminate until it is agreed whether 
time or distance is the fixed element. The correct average will differ 
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according to the condition agreed upon. This will be made clear in 
the following analysis. 

Case I. Let (^ ^ i 

, di 


denote the fth rate, z = 1, 2, • • •, n. Then the average rate is 

D — total distance tiX\ + iaXa + • • • 

T = total time + ^2 + • • • + in 

Condition 1. Let distance be the fixed element, i.e., let d be con¬ 
stant. Then d = tiXi, and i,- = d/xi. Therefore, the expression for 
average rate becomes 

^tiXi _ nd 1 

z- rfZ- -E- 

Xi Xi n Xi 

which is the harmonic moan. 

Condition 2. Suppose t is the fixed element. Then ^UXi 
becomes i^Xi since ^ is a constant, and becomes nL Hence, we 
have for the average rate, 

T nt • 

which is the arithmetic mean. 

Case II. Let Xi = U/di denote the ith rate. Then the average 
rate is 

T = total time _ 

D = total distance 

Condition 1. Suppose d is the fixed element. Then U = dxi and 
d = ti/xi. Hence, we have 

T _ dY^Xj _ Y^Xj 

D nd n 

Condition 2. Let t be fixed. Then d» = t/xi and the average 
rate is 

r _ nt _ 1 

^ iZ- -E- 

Xi n Xi 
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We therefore state the following rules for averaging rates: 

Rule 1. The harmonic mean is used whenever the fixed element is d 
and the rates are expressed in the form d/t, or when the fixed element 
is t and the rates are expressed in the form t/d. 

Rule 2. The arithmetic mean should be used when the fixed ele¬ 
ment is t and the rates are expressed in the form d/t, or when the fixed 
element is d and the rates are expressed in the form t/d. 

In the case of prices, which are of course ratios, a similar discussion 
holds except that now the unit of time is to be replaced by a unit of 
money. Therefore, prices are ratios between two quantities, one of 
which is in units of money and the other in units of some commodity 
or service. They may bo stated as so much money per unit of com¬ 
modity (m/c), or as so many units of commodity per dollar (c/m). 
Thus, if 100 bushels of wheat are exchanged for 75 dollars of gold, the 
price of the wheat in terms of gold is 75 100, or three-fourths of a 

dollar of gold per bushel of wheat. Contrariwise, the price of gold 
in terms of wheat is 100 75, or one and one-third bushels of wheat 

per dollar of gold. Thus, there are always two prices in any ex¬ 
change. 

The correct average will depend upon how the prices are stated and 
upon whether a unit of the commodity (or service) or a unit of money 
is the fixed element. 

The following papers in The Journal of the American Statistical 
Association are recommended: 

1. The Nature and Use of the Harmonic Mean ” — W. F. Ferger, 
vol. 26 (1931), pp. 36-40. 

2. “Calculating the Geometric Mean from a Large Amount of 
Data ” — Zenon Szatrowski, vol. 41 (1946), pp. 218-220. 


Examples 

4. In a certain factory a unit of wqrk is completed by A in four minutes, by B 
in five minutes, by C in six minutes, by D in ten minutes, and by E in 
twelve minutes. What is their average rate of working? At this rate 
how many units will they complete in a six-hour day? 

Solution. The rates are expressed in the form t/d but it would seem appro¬ 
priate to regard t as the basic or fixed element since output per unit of 
tiTne appears to be the important consideration here. So by Rule 1, 


H.M. 


_ 6 _ 

4 + i + i + i*ff + 
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that is, 


300 

II.M. = — = 6i minutes per unit. 


In 360 minutes they will complete 


4 ^ 60 ^_ 


■ 

A 


25 


288 units. 


2 . A tourist purchases gasoline at three stations, as follows: 


Station 

A 

B 

C 


Number of gallons of 
gasoline for $1,00 ^ 

5 

7 

6 


Here the prices are given in the form r/m and it would seem appropriate to regard 
gallon (c) as the fixed element and prices (m)* per gallon as the variable quantities 
which are to be averaged. Hence, replacing d/t by c/m and “ rates ” by prices ” 
in Rule 1, we are led to find the harmonic mean. 


H.M. = 


5 + 6 + i 


630 

= per $1.00 


SI 07 
630 


per gal. ; ^ ' 


Exercises 

1. (a) The arithmetic mean of a set of 30 numbers is 82. What is the sum of 

these numbers? 

(b) The G.M. of ten numbers is 1.40. What is the product of these ten 
numbers? 

2. In chemistry a student was graded 65 in final examination, 85 in recitation 

and 80 in laboratory. These grades were weighted 1, 2, and 3 respectively. 
Find the studelit’s average grade. 

3. At the end of his first semester in college a freshman had credits as follows: 

4 hours of mathematics with a grade of 88, 4 hours of l^nglish with a grade 
of 80, 3 hours of history with a grade of 85, and 4 hours of physics with a 
grade of 78. What was his average grade per hour of credit? 

4. Find the median of Table 12. 

• 6. The population of a city increased in 5 years from 225,000 to 245,000. What 
was the average increase per year? What was the average annual rate of 
increase? 

6. The number of bacteria in a certain culture was found to be 4 X 10® at noon of 
one day. At noon the next day the number w^as found to be 9 X 10®. 
If the number increased at a constant rate per hour, how many bacteria 
were there at midnight? 
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7. Find the average (G.M.) rate of interest for five years during which the in¬ 

terest rates were 4.25%, 5.3%, 4.65%, 3.86%, 4.38%. 

HM, (1 4- iY = (1.0425)(1.063)(1.0465)(1.0386)(1.0438). 

8 . Find the harmonic mean of the first fifteen positive integers. 

9. For two positive numbers, a and 5, the geometric mean is a; = V^. This is 

also called the mean proportional between a and 6, since a :x == x :h. 
By drawing a semicircle on o -|- 6 as diameter, show how the value of x can 
be constructed geometrically. 

10. The following table gives the population of the U. S. at each IQ-year census 
from 1860 to 1920. 


Year 

X 

Population 

(millions) 

Ratio of Each Census 
Figure to Preceding 

1860 

31.4 


70 

38.6 

1.23 

80 

50.2 

1.30 

90 

63.0 

1.25 

1900 

76.0 

1.20 

10 

92.0 

1.21 

20 

105.7 

1.15 


What is the average rate of increase per decade? Using this average, 
estimate the population for 1930 from the 1920 census figure. 

11. If a series of positive variates form a geometric progression show that their 
* logarithms form an arithmetic progression. 

12. Find the geometric mean of the following; 

(а) 2, 4, 8, 16, 32, 

(б) 47, 92, 123, 218. 

13. Given two sets of n positive variates each: 

3^11, 37 i 2, Xi3f * * * , Xin 
X21f X22, X23f • ' • , X2n. 

Prove that the geometric mean of the ratios of corresponding variates in 
the two sets is equal to the ratio of their geometric means. 

14. (a) For a frequency distribution of positive variates show that (10) becomes 

G.M. = • X2^^ • • • 

where k is the number of different values of x in the set, any exponent /* is 

k 

the number of times Xi is repeated, and N = ]^/i. 

1 

^ (6) What is the expression for log G.M. when G.M. is defined as in (a)? 

A wholesale firm has twelve travelling salesmen who make trips of essentially 
the same length. Of these, eight make their trip in 20 days and four in 15 
days. What is the average time per trip? Ans, 18 days. 
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j20. 
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State two rules for averaging prices similar to those given for averaging rates. 
Give illustrations. 

Consider any two positive variates xi and X 2 . Prove that their geometric 
mean is equal to the geometric mean between their arithmetic mean and 
their harmonic mean. 

(Burgess) The following problem arose in a statistical office in Washington 
during World War I: Suppose 20 boats make 6 trans-atlantic trips 
each per year, giving as the time for a “ turn around ” (f.e., time between 
consecutive departures from the same ports), one-sixth year — approxi¬ 
mately 60 days, and that 10 boats make 4 trips per year, giving as their 
time for a turn around one-fourth year, approximately 90 days. (A 
year of 360 days is used merely for convenience.) What is the average 
number of days per turn around? 

Hint. If we think of the rates expressed as “ trips per year ” then 
X — dji. If / is regarded as the fixed element, then by Rule 2 the arith¬ 
metic mean is indicalx)d, and a: — 6 for 20 values of x, and x = 4 for 10 
values. 

If we think of the rates expressed as “ days per trip '' then x = ifd. If 
i is the fixed element, by Rule 1 the harmonic mean is the correct average, 
and X = 60 for 20 values and x = 90 for 10 values. Ans. 5i trips per 
year or 67.5 days per trip. 

Show that if 2a is the harmonic mean of the two rational numbers b and c, 
then the sum of the squares of the three numbers o, 6, and c is the square 
of a rational number. 

(Reference: American Matliemalical Monthly, June 1935, p. 394.) 

(a) If A, G, and H represent, respectively, the arithmetic, geometric, and 
harmonic means of N unequal positive variates, prove that 

II < G < A 

(Reference: Burgess’ text, p. 101.) 

(h) What can you say if the AT positive variates are equal? 

A plane travels one half of a given distance I) in miles at a speed of Xi miles 
per hour, and the remaining half distance at a speed of X 2 miles per hour. 
Show that the average speed for the imtire distance is the harmonic mean 
of xi and X 2 . Half of this average speed is called the radius of action 
per hour i.e., it is the outbound distance that a plane can travel and 
return in one hour. The “radius of acjtion” of a plane would be the 
“ radius of action per hour ” multiplied by the number of hours in flight. 



CHAPTER IV 
MOMENTS 

1. Moments about an Arbitrary Origin. One of the general prob¬ 
lems of statistics is to summarize and characterize data. In the 
words of R. A. Fisher, 

A quantity of data which by its mere bulk may be incapable of entering the 
mind is to be replaced by relatively few quantities which shall adequately rep¬ 
resent the whole, or which, in other words, shall contain as much as possible, 
ideally the whole, of the relevant information contained in the original data> 

These relatively few quantities are usually expressed in terms 
of moments. Moments are of different orders and the student is 
already familiar with what is now to be known as the first moment, 
namely, the arithmetic mean of the first powers of the variates. We 
will also need in our work the arithmetic means, respectively, of the 
second, third, and fourth powers of the variates. With reference to 
an arbitrary origin, moments are denoted by v (the Greek letter nu) 
with a subscript specifying the order. 

The first four moments, relative to the a;-origin and in the x unit, 
are defined as follows: 

= = X 

- 

A more general definition of the j'’s is 
(lo) - xoY 

^ Foundations of Theoretical Statistics^ Philosophical Transactions of the Royal 
Society, vol. 222A (1922), p. 309. 


( 1 ) 


V2 


V4 


i varying from 1 to k. 
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for the rth moment about an arbitrary point xo. When x© = 0 and 
r = 1, 2, 3, and 4, we have the definitions stated in (1). If r == 0 
we have the zerofft moment and vq = 1. 

In statistics we work with moments per unit frequency. The 
term moment ” has its origin in mechanics where we speak of the 
“ moment of a force.” Suppose we have a rigid bar, called a 
lever, with one point of support known as a fulcrum (Figure 10). If 


a force /i is applied to the lever 
at a distance Xi from the fulcrum 
0, the product xifi is called the 
moment of the force. If there 
are two or more such forces /i, 
/ 2 , • • *, fh acting in the same 
direction, and at the distances 



Fig. 10 


oci, ^ 2 , • • *, Xkj respectively from O, the total moment of all these 


forces is 


flXi + / 2 X 2 + • • + fkXk = 


If the distances x are squared, we have'^/iXi^ as the total second 
moment, and represents the rth moment. 

It is by analogy with this mechanical concept that the expressions 
in (1) are called statistical moments (per unit frequency) about zero 
as origin. 


Exercises 

1. Write out the expanded form of the defined in (1). 

2 . Calculate the values of vi, V 2 f and us for the following distributions: 


(a) (6) 



moments of odd order may be positive, negative, or zero. 
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(c) Show that the odd moments are all zero if both the x*8 and fa are S3mi- 
metrical with respect to the origin of x, as, for example, 


X 

-1.5 

-1.0 

-0.5 

0.5 

1.0 

1.5 

f 

1 

2 

3 

3 

2 

1 


2. Moments in Units of the Class Interval. In Chapter III, 
§8, the mean in the x unit was obtained by first finding the mean in 

the u unit, viz., ~ changing over into the x unit by 


multiplying by the interval c. In our subsequent work, which re¬ 
quires the higher moments, we shall find it convenient to use a similar 
procedure, and find those moments in the u unit, where u = 
(x — Xo)/c, It is desirable, therefore, in labeling the moments for 
any distribution, to specify whether they are in the unit of x or u. 
This is commonly done by the use of a second subscript on p. Thus 
Pr:u denotes the rth moment in the u unit and relative to the w-origin. 
Therefore, 


V2:» = ^ 

V4n» = 


Similarly, v,.. will mean ^ When there is no ambiguity, the 

second subscript on v may be omitted. 

3. Moments about the Mean. Formulas (1) and (2) define the 
moments taken about zero as origin although in different units. 
When the mean is chosen as origin we have the most important set 
of moments in the theory of statistics. In this case the Greek letter 
M (mu) is used to denote the moments, and it is always understood 
that the use of n specifies the mean as origin. It does not, however, 
designate the unit, so the second subscript may still be necessary. 
Therefore, the rth moment about the mean is defined by either of the 
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following expressions: 

- *)*■ 

(3) ^ 

l*'r:u =^IZ/<(«< - «)'• 
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The mean is a sort of balance point. If weights proportional to 
the frequencies are suspended along a horizontal bar at distances from 
one end proportional to the numbers representing the class marks, 
then the bar will balance at the weighted mean of the distances. In 
mechanics this point is known as the abscissa of the center of gravity 
or centroid. Theorem VI of Chapter III, §7, is another way of say¬ 
ing that the given distribution is in equilibrium about this point. 

4. Relations between the ji’s and v’s. We shall see that the de¬ 
scriptive constants mentioned at the beginning of the chapter are 
defined in terms of the moments about the mean, but the moments 
about an arbitrary point are easier to calculate. In other words, 
what we desire are the values of /Ur, but their computation directly 
from the definitions (3) may be very laborious even in the u unit due 
to the fact that {u — u) usually involves decimals. Raising these 
decimals to the second, third, and fourth powers becomes tedious 
even with the aid of a computing machine. On the other hand, the 
defined in (2) are readily computed. Therefore, instead of com¬ 
puting the fjL^s directly we obtain them indirectly from the r^s. The 
relations between the /x^s and j'^s can be found by expanding, by the 
Bino mial Theorem. eith (?r of the expressions following the in 
(3) for r = 2, 3, 4. This is done^in the u unit as follows: 

H-it = ~ 

- 2u • ^ 1 

= Vi — 2Svi + S’ 

= Vi— (Vi)2, since u = Vj 

- uy 

= Vs - ZVi • Vi + 2(Vi)» 

Ji4 = Vs - 4 v 3 • Vi + 6 v2(Vi)* - 3(Vi)*. 


(4) 

( 6 ) 

( 6 ) 
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These formulas are important and the student should be able to 
derive them. It should be apparent that these moment relations 
hold also in the x unit. However, if we have the in the u unit 
and we desire them in the x unit they may be found as follows: 


(7) 


^*2:* = C2|l2:u 
= C3|l3:u 
1 ^ 4 :* = 


The first of the relations given in (7) is proved below. The others 
may be proved in a similar manner. 


M2:x 


= ~ X) by definition, 

X/*(^o + cui — xo — cuY by (4o) and (5), Chapter III, 


We see that the indirect method of computing the (in the u unit) 
involves two steps. First the ji^’s are computed according to the 
definitions in (2). This step is illustrated in Table 18. Then we 
calculate the /x’s by substituting the computed in relations (4), 
(5), and (6). The /x^s in the x unit could then be obtained, if desired, 
by means of (7). 

Before proceeding with the second step it is desirable to check the 
p's or, at least, the totals of the columns from which they are ob¬ 
tained. This can be done if we have another column headed 
/(w + 1)^, and observe that 

23 /(« + 1 )^ = + 42 /“* + 62 /“* + 42 /“ + 2 /- 

This is known as Charlier’s check. An alternative one is to check the 
entries in the column fu* against the proper entries in Pearson’s 
Tables for Statisticians and Biometricians, Table L. 

Charlier’s check is a necessary but not a sufficient check. That is 
to say, compensating errors may occur which this check would not 
detect. However, the occurrence of such errors is very unlikely. 

Applying Charlier’s check to Table 18 we have 

1088 + 4(-236) + 6(176) + 4(-20) + 100 = 1220. 
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Table IS — Moments for Distribution op Grades 


Data 

ComptUaiions 

X 

n 

u 

fu 


]v? 

fvS 

/(m + 1)« 

34.5 

2 

-4 

- 8 

32 

-l?3 

512 

162 

44.5 

3 

-3 

- 9 

27 

- 81 

243 

48 

54.5 

11 

-2 

-22 

44 

- 88 

176 

11 

64.5 

20 

~i! 


20 

- 20 

20 

0 

74.5 

, 32 

m 


0 

0 

0 

32 

84.5 

25 

1 

25 

25 

25 

25 

400 

94.5 

7 

2 

14 

28 

56 

112 

567 

Sums 

100 


-20 

176 

-236 

1088 

1220 

Sums 

1 


-.20 

1.76 

-2.36 

10.88 

For Charlier's 

N 



VI :u 

V2:u 

VZ\u 

Vixu 

check 


Hence we may proceed with confidence to compute the /i^s. Using 
relations (4), (5), and (6): 

/i2:u = 1.76 - (-.20)2 = 1 72 

M3:u =-2.36 - 3(l,76)(-.20) + 2(-.20)3 = -1.320 

M4:u = 10.88 - 4(-2.36)(-.20) + 6(1.76)(-.20)2 - 3(-.20)4 
= 9.4096. 

The following check, which can be handled readily on a machine, 
may be used to check the m’s: 

i'4 = ^ ^ - vi) + >'1? 

= IM + + Gjuafi® + vi*. 

Before explaining the applications of m, fis, and m we present some 
exercises which will aid the student in mastering the procedure thus 
far developed. 

Exercises 

1. (a) Verify relations (4), (5), and (6). 

{h) Show that these relations hold also in the x unit. 

(c) Prove that mi ~ 0 in any unit. 
id) When / = 1, show that 

1 ^ 

/»«:. “ t; 

N 
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2 . Verify the relations given in (7). 

3. Using Table 18 as a model find the y's for Iowa City rainfall by extending 

Table 17. 

4. Find the m's from your results in Exercise 3 above. 

6. Standard Deviation. Formula (4), 112 — vz — v\^y is perhaps 
the most important of the moment relations for elementary statistics. 
It states that the second moment about the mean is equal to the 
second moment about zero diminished by the square of the mean 
measured from zero. 

Many of the definitions in statistics are essentially those of physics 
and mechanics. The analogy between the mean and centroid has 
been mentioned. The above statement about formula (4) is a well- 
known proposition in mechanics when the word centroid is substi¬ 
tuted for mean. 

In mechanics the equivalent of iV/x 2 is called the moment of inertia 
(about the axis through the centroid) and is the radius of gyra^- 

tim. These notions are carried over in statistics. Suppose a thin 
metal plate in the shape of a histogram is rotating about a vertical 
axis through its centroid. There is a distance from the centroid at 
which the entire mass of the histogram could be concentrated 
without changing its moment of inertia. This distance is the 
square root of /Lt 2 . It is an average rotational radius for all par¬ 
ticles of the rotating mass. In statistics, is called the stand'- 

ard deviation and is denoted by the small Greek letter a. Therefore 
we have 



We shall see later that or is a measure of what is called dispersion. 
More precisely, it measures the extent to which the data are spread 
out on the average on either side of the mean. (See Figure 11.) 
The student will obtain a more complete understanding of a as the 
course develops. 

The mean and standard deviation are always expressed finally in 
the same units as the variates. If x represents inches, we desire the 
mean and standard deviation in inches. When obtained they should 
be labelled appropriately. 
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Fig. 11 


Example. For Table 18, we have 

5 = dZ + a;o = 10(-.20) + 74.5 = 72.5% 
ffu = = (1.72)1/* = 1.31 

= c<Tu = 10(1.31) = 13.1% 

Thus, we have explained the use of the first and second moments. 

The student will observe that the change from Uu to <Tx does not in¬ 
volve Xq. The standard deviation is affected by the change in units 
but is independent of the origin of reference. To prove this let 
x' = x — Xo, whence == x — Xo (why?). Then 

ff*'* = — »')* 

=• ^ - Xa-X + *o]* 

= ^ HMxi - xy 

= H2-.X = ffx*. 

This suggests the more general 

Theorem. The value of fir remains invariant under a transforma¬ 
tion which changes only the origin of reference of the variates. 

The student is asked to prove the equivalent of this theorem in 
Exercise 3 after §9. 

6. Standard Units. The above section explains in. There re> 
mains the explanation of yt and m. We will lead up to this by 
defining standard units. We have mentioned the transformation 
x' = a: — 2. Another very useful transformation consists in measur¬ 
ing such deviations from the mean in units of the standard deviation, 
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Vs, of the entire distribution. They are then known as standard 
units and will be designated by t. Thus, 



iTx cr* 


Graphically, this translates the origin to the mean and measures dis¬ 
tances along the horizontal axis in terms of Vx. It is a special case 
of the more general transformation 

X — Xo 

u =-- 

c 


The significant characteristic of the t variate is its independence of 
the unit in which the original measurements were taken. For ex¬ 
ample, suppose we were concerned with obtaining the linear measure¬ 
ments of a set of individuals. One distribution of variates would 
result if the measurements were made in feet. In this case x', x, and 
Vx would also be in feet. If the measurements were taken in inches, 
then x', X, and Vx would be in inches, and each of these values would 
be, numerically, twelve times as large as the corresponding numbers 
in the first distribution. However, the variates expressed in standard 
units would be the same for the two distributions. Thus if 


and 


X = 50 ft. = 50(12) in., 
<r, = 5 ft. = 5(12) in.. 


then for an individual measurement of a; = 60 ft. = 60(12) in., we 
have 

= 10 ft- ^ 10(12) in. 

“ 5 ft. “ 5(12) in. 
t = 2 = 2. 


It is obvious, therefore, that standard units provide a basis for 
comparing distributions. Moreover, they make possible important 
simplifications in certain mathematical operations. 

With the aid of a computing machine, a distribution may be easily 
transformed into standard units by means'of the so-called continuous 
process. To illustrate, suppose for the distribution of Table 9 (§11, 
Chapter I), it has been found that 

X = 47.712 lbs. 

Vx = 5.772 lbs. 
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By relation (9), then, 

X - 47.712 

i — = .17325a: - 8.2661. 

Referring to the discussion of the continuous method given in the 
Introduction, we observe that here A* = —8.2661, n = .17325, and we 
desire the values of t corresponding to the values of x given in Table 9. 
For the values of x such that nx < k, we write the above relation in 
the form 

-t = 8.2661 - .17325X. 

The procedure^ now is to register 8.266100 on the product register, 
punch the constant factor .17325 on the keyboard, and then by turn¬ 
ing the crank backward so that the successive values of x appear on 
the revolution register, we subtract from k the products of this mul¬ 
tiplier and the values of x. The various values of x are built over 
from one to another without clearing the dial. The resulting values 
of — ^ are read at each stage from the product register until we get 
—t = 0.383. From here, nx > fr, so we clear the dials and start 
over using the original form of the relation between x and L We now 
register —8.266100 on the product register by turning the crank 
backward, punch ,17325 on the keyboard, and turn the crank for¬ 
ward to form the values of x on the revolution register. The values 
of t are read as before from the product register at each stage of the 
build-over process. In this way the following set of standard vari¬ 
ates is obtained: 

Table 19 


X 

/ 

t 

29.5 

1 

-3.155 

33.5 

14 

-2.462 

37.5 

56 

-1.770 

41.5 

172 

-1.076 

45.5 

245 

-0.383 

49.5 

263 

0.310 

53.5 

156 

1.003 

57.5 

67 

1.696 

61.5 

23 

2.389 

65.5 

3 

3.082 


^ If automatic machines are available the instructor will explain the pro¬ 
cedure. 
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We see from Table 19 that a range of ^ = ±3 takes in practically all 
the variates. This is typical of the more common distributions. 

If ^ = 0, then t = x/<T and the origin of t is the same as the origin of 
X. Some writers use X to denote the variates (i.e., pounds, dollars, 
temperatures, etc.), and use x to denote deviations from the mean. 
In that notation, t = x/(t would have the same meaning as our equa¬ 
tion (9). Occasionally in later chapters we shall find it convenient 
to designate deviations from the mean by x (instead of a;')- If so, it 
will be stated that the origin of x is at the mean or centroid. 

7. Moments in Standard Units. The moments in standard units 
are denoted by the Greek letter alpha, a. Thus for the rth moment 


in standard units, we have X/i^***- 


However, it is not neces¬ 


sary to transform the variates into t units in order to compute the a’s. 
We shall show that they are functions of the m^s. Thus 


Hence 

( 10 ) 



by definition 


from (9) 

—, - 2)'- 

Why? 

(o-,)’’ 

Why? 

_ l*r:. 

(lb:,)’’'* 

from (8). 


Letting r = 1, 2, 3, 4 in (10) we have 

|ll:* 


(10a) 


ai = — = 0 

O'* 

= 


Oa = 


04 


or,' 

tb:, 

!*«=» 

(o’.)* 


It is obvious that ai and ag are abstract numbers. This is also the 
case for the other a’s. In the expressions for as and as both niunera- 
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tor and denominator are of the same dimension. That is to say, in 
ccz = both numerator and denominator are the cubes of what¬ 
ever unit is used in the original measurements, and therefore their 
ratio is of zero dimension, a pure number. Similarly, in = M 4 /<r^ 
both numerator and denominator are the four powers of the same 
unit, and therefore is an abstract number. 

Some writers use gi instead of as and g 2 for — 3. 

8. Use of as and 04 . Since ai and have the same values for all 
frequency distributions, their computation contributes nothing to 
the description or characterization of a distribution. But the values 
of as and a 4 depend upon the shape of the histogram representing a 
distribution, and are therefore useful in distinguishing between types 
of distributions. Thus, we observe that 

is a measure of asymmetry about the mean. If the variates are dis- ^ 
tributed symmetrically about x then /is = 0. But if the positive 
deviations from the mean outweigh the negative deviations then 
/is > 0 , whereas if the negative deviations predominate, then /is < 0 . 
Cubing the deviations gives a measure which is sensitive both to their 
size and sign but the result is in cubic units. Now symmetry, or lack 
of it, is not a function of the original units of measurement, so if we 
divide /is by we get a pure number. Thus as is a satisfactory meas¬ 
ure for comparing symmetry in distributions of different units of 
measurement. 

The quantity a 4 measures a characteristic called “ kurtosis.” It' 
refers to the relative number of variates in the vicinity of the mean. 
More will be said about as and a 4 later on. At this time, emphasis 
should be placed upon their calculation rather than upon the infor¬ 
mation which they yield. 

Inasmuch as the a's are independent of the unit of measurement, 
they may be computed from the moments in the u unit. Changing 
these moments into the x unit would only introduce the same factor 
into the numerator and denominator, which would of course divide 
out. Thus: 

— _ /^3;tt 

* <7a? C^CTu® Vu® 

_ /^ _ cV4:x _ /X4:u 
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For Table 18 we have 

Oj = 


04 = 


-1.320 
(1.72) (1.31) 
9.4096 


(1.72)« 


= 3.18. 


- 0 . 686 , 


Although no limits can be placed on the possible values which os 
' and 04 may take, it may be said that for the more common distri¬ 
butions 04 fluctuates around 3 and 03 is usually not more than 2 nor 
i less than — 2 . We cannot go into the theoretical reasons for these 
\ values and we mention them here merely to guide the student as to 
what is a reasonable result to expect in the exercises in this book. 
In this connection, the inequality ‘ 

04 ^ 03* + 1 


may also prove useful. When the numerical value of 03 is large, the 
distribution may be of the J-shaped type which is an extreme form 
of the asymmetrical type. However, these types cannot always be 
distinguished by elementary methods if the original data are not 
available. 

9. Summary. The quantities f, <r*, as, and at are called the de¬ 
scriptive constants of the distribution. They (together with N) are 
the “relatively few quantities” (§1) which, in certain cases, con¬ 
tain all the relevant information in the distribution. Table 20 will 
serve as a model for the procedure which the student should follow 
in computing these quantities. Of course, if the work is done on a 
computing machine, only the totals of the power sums need be re¬ 
corded. The detail of the columns may be omitted. In Table 20, 
c = 1, so <r, = <r„. Obviously, this would not be true in general. 

The calculation of the r’s proceeds naturally as an extension of the 
work required to compute x for a frequency distribution. Thus to 
obtain x we first compute vvu and then obtain x from the relation 

X <= cB -f- Xfl. 


To obtain the standard deviation we need the value of Vi because <r, 
is found from the relations 


Hi = Vi — B* 



* “ A Note on Skewness and Kurtosis ” — J. Ernest Wilkins, Jr. Annala of 
Mathematical Statislica, vol. 15 (1944), pp. 333-335. 



Sec. 9 


Summary 


7S 


The next chapter is devoted to a discussion of dispersion of which cr, 
is a measure. To be sure, the standard deviation is only one of several 
measures of dispersion, just as the mean is only one of several aver¬ 
ages. But both the mean and the standard deviation play important 
roles in the theory and practice of statistics. It is important to 
master the pattern by which they are computed in a frequency 
distribution. 

In order to compute otz and 0:4 we first require vz and V 4 (in addition 
to vi and vz)- Then ^3 and jn are obtained from (5) and (6). Finally, 


Mr:u 



is computed for r = 3 and r = 4. The characteristics of a distri¬ 
bution which as and describe will be discussed in Chapter VI and 
again in Part II. In elementary work they are less important than 
X and (Tx* 

With regard to the number of decimal places to be retained in 
computations, the author agrees with Dr. Shewhart who says: ‘‘It 
does not appear feasible ... to lay down simple, practical, and in¬ 
fallible rules.'' Reasons in support of this opinion are stated in his 
book,^ pp. 79-80. For other remarks in this connection, the reader 
is referred to the books by Walker and Scarborough which are cited 
in our Introduction. . 


Exercises 

1. (a) What is the numerical value of the mean of any distribution of variates 

expressed in t units? 

(6) What is the standard deviation of such a distribution? Hint: at = 

2. (a) Show that (x — 2) = c(w — S) and hence that i = (u — u)/<ru. 

(6) Show that we obtain the same results for the a's if we take 

u — u 

t =-• 

au 

8. Prove: If any constant is added algebraically to each variate of a series the 
values of /xr for the new series will be identical with the corresponding values 
of Mr of the original series. 

4. Suppose each variate is multiplied by a constant. What effect would this 
have on S, (r«, aa, and 04 ? 

^ See footnote, p. 52. 



76 


Moments 


IV 


5. Show that the standard deviation of x may be written 

-[]i 

6 . Prove the general relation 

Mr:* = 

of which the relations given in (7) are special ca es when r = 2, 3, 4. 
Uini: (as — 2B) = c(w — u). 

7. (a) Show that ao = 1. 

(b) Show that v’' = 0*2)*"^^ in both the x and u units. 

8 * Prove from (4) that m 2 is less than or at most equal to V 2 , the same unit being 
used in each case. 

9* Find 5, as, and at for Iowa City rainfall using your results from Prob¬ 
lem 4 of the preceding set of Exercises. 

Ans. 

5 = 2.80 in. as = 1.29, 

Cx = 2.01 in. ai = 4.58. 

10« Using Table 20 as a model find 2, <rx, as, and at for the distributions in §11, 
Chapter I, according to the direction of the instructor. 
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Table 20 — Specimen Worksheets for Computing the Characterizing 
Constants of a Distribution 


Subject: Span among Adult Males (Table 13) 


X 

/ 

u 

uf 

u^J 

U^f 

u*f 

(u + 1)V 

58.5 

1 

-11 

- 11 

121 

-1,331 

14,641 

10,000 

59.5 

2 

-10 

- 20 

200 

-2,000 

20,000 

13,122 

60.5 

1 

- 9 

- 9 

81 

- 729 

6,561 

4,096 

61.5 

6 

- 8 

- 48 

384 

-3,072 

24,576 

14,406 

62.5 

7 

- 7 

- 49 

343 

-2,401 

16,807 

9,072 

63.5 

22 

- 6 

-132 

792 

-4,752 

28,512 

13,750 

64.5 

55 

- 5 

-275 

1,375 

-6,875 

34,375 

14,080 

65.5 

111 

- 4 

-444 

1,776 

-7,104 

28,416 

8,991 

66.5 

146 

- 3 

-438 

1,341 

-3,942 

11,826 

2,336 

67.5 

182 

- 2 

-364 

728 

-1,456 

2,912 

182 

68.5 

229 

- 1 

-229 

229 

- 229 

229 

0 

69.5 

265 

0 

0 

0 

0 

0 

265 

70.6 

263 

1 

263 

263 

263 

263 

4,208 

71.5 

217 

2 

434 

868 

1,736 

3,472 

17,577 

72.5 

176 

3 

528 

1,584 

4,752 

14,256 

45,056 

73.5 

132 

4 

528 

2,112 

8,448 

33,792 

82,500 

74.5 

82 

5 

410 

2,050 

10,250 

51,250 

106,272 

75.5 

48 

6 

288 

1,728 

10,368 

62,208 

115,248 

76.5 

20 

7 - 

140 

980 

6,860 

48,020 

81,920 

77.5 

16 

8 

128 

1,024 

8,192 

65,536 

104,976 

78.5 

12 

9 

108 

972 

8,748 

78,732 

120,000 

79.5 

3 

10 

30 

300 

3,000 

30,000 

43,923 

80.5 

1 

11 

11 

121 

1,331 

14,641 

20,736 

81.5 

2 

12 

24 

288 

3,456 

41,472 

57,122 

82.5 

1 

13 

13 

169 

2,197 

28,561 

38,416 

Sums 

2,000 


886 

19,802 

35,710 

661,058 

928,254 

(Sum8)/iV' 



.443 

9.901 

17.855 

330.529 





u 

P2 

Pi 

PA 



Charlier^s check: 

23(» +1)‘/ = LwV + + 4E“/ + Z/ 

928,264 = 661,068 + 4(36,710) + 6(19,802) + 4(886) + 2,000 = 928,264 
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Compuiaiums: 

2 = cu + xo = (1)0443) + 69.5 = 69.943 in. 
= .196249 

u» = .086938, u* = .038514 

Ma =® >^2 

= 9.901 - .196249 = 9.704751 
<r„ = V9.704751 = 3.116 
ff, = C(r„ = (1) (3.115) = 3.115 in. 

fi8 == I's — 31^2^ + 212* 

= 17.855 - 3(9.901)0443) + 2 (.086938) 

= 17.855 - 13.158429 + .173876 
= 4.870447 


/L(4 = V4 ■“ ^ViU + 6l'2i2* — 312^ 

= (330.529) -4(17.855)0443) -f 6 (9.901) (.96249) - 3 (.028514) 
= 330.529 - 31.639060 + 11.658368 - .115542 
= 310.432769 

au* = (3.115) (9.704751) = 30.230299 
= (9.704751)2 = 94.182192 


Of# 


€*4 


4.870447 

30.203299 

310.432766 

94.082192 


= .161 


= 3.296 


Summary: 

2 = 69.943 in.; a# = 0.161; 

cr* = 3.115 in.; a# = 3.296. 


10. Sheppard’s Corrections. The moments of a frequency dis¬ 
tribution are computed on the assumption that each variate value in 
a class interval has the value of the class mark for that interval. This 
has the effect of replacing the actual data by somewhat fictitious data 
asKgned arbitrarily at the central values of the intervals. Evidently 
a very coarse grouping might be misleading and it can be shown math¬ 
ematically that the above assumption introduces a systematic error, 
called a grouping error, in the results obtained for the second and 
fourth moments about the mean but does not affect (ti and ftg. To 
eliminate this systematic tendency certain corrections are applied to 
Hi and in. 

The derivation of these corrections is beyond the scope of an ele¬ 
mentary course, but it may be worth while to see why it is that cor¬ 
rections are necessary for some moments and not for others. The 
following argument is intended only as a pedagogical device to give 
a plausible explanation. Suppose a smooth curve represents the 



Sec. 10 


Sheppard’s Corrections 


79 


true frequency distribution while the histogram represents the dis¬ 
tribution with class marks as the variates. Since the moments are 
computed from the distribution represented by the histogram, we 
scarcely expect our results to be exactly the values of the moments 
of the true distribution, which are, of course, what we seek. In using 
the distribution represented by the histogram, we are neglecting, for 
each rectangle, the little area under the curve shaded A and sub¬ 
stituting for it the little area shaded B. Suppose that, in general, 
B is a little larger than A, as shown in Figure 12. The excess of B 



Fia. 12 


over A for those rectangles to the left of x will be negative; the cor¬ 
responding excess for those rectangles to the right of x will be posi¬ 
tive. This may be readily understood by considering these little areas 
as approximate triangles whose bases are negative or positive accord¬ 
ing as they are to the left or right of x. These excesses for all the rec¬ 
tangles, both positive and negative, are involved in taking the sum¬ 
mation ^fi(xi — x)' for the moments. When r is an odd number, 
as 1 or 3, the excesses show up with their algebraic signs and there¬ 
fore, over the range of the distribution, the positive excesses just 
about offset the negatives ones. But in the case of the even moments, 
all the excesses now become positive so that the errors accumulate 
and the final results for these moments are too large. 

To reduce these errors due to grouping, W. F. Sheppard has demon¬ 
strated^ that the following corrections should be applied. It should 

^ Students familiar with more advanced mathematics will find an interesting 
discussion of systematic errors and references to papers dealing with Sheppard’s 
corrections in an article by H. C. Carver, Annals of Mathemalical Statistics, 
vol. 7, p. 164. 
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be noticed that as we state them here they should be applied only 
where the class interval is unity, i.e., in the u unit. 


Corrected = uncorrected 
Corrected /*8:» = uncorrected /U8:u 
Corrected iii.u — uncorrected in-u 


_ , 

12 

1 7 

- - (uncorrected ^ 


(A - 0. 

Vl2 


08333, — = 0.02917 


)• 


Example. For Table 18 we have 


Corrected m 2 :u = 1.720 — 0.083 = 1.637 

<r„ = Vi-es? = 1.28 


Corrected <r* = 10(1.28) = 12.8% 
Corrected m 4 :u = 9.4096 - (1.72)/2 + 7/240 
= 8.5788 

/. a4 - 8.5788/(1.637)2 = 3.20 


The values of S and ms remain unchanged. 


Sheppard^s corrections are valid only for the bell-shaped types of 
distributfons. They are not applicable to the J-shaped or U-shaped 
types. Moreover, they constitute a refinement which may not al¬ 
ways be consistent with the degree of accuracy in the original data. 
The errors of grouping (not mistakes) are usually small compared 
with the errors existing in the raw data. So, it seems that little 
would be gained by their use in a first course. We will occasionally 
use them in an illustration. 



CHAPTER V 

MEASURES OF DISPERSION 

1. Introduction. The concept of variability is fundamental today 
not only in the social sciences but also in the so-called exact physical 
sciences. Modem scientific method recognizes the existence of 
physical, moral, and mental inequalities. The principle of variabil¬ 
ity has come to be accepted as the natural order in social, economic, 
and physical phenomena. This principle is the very essence of the 
statistical nature of mass phenomena. In this connection, R. A. 
Fisher says:^ 

The conception of statistics as the study of variation is the natural outcome of 
viewing the subject as the study of populations; for a population of individuals 
in all respects identical is completely described by a description of any one indi¬ 
vidual, together with the number in the group. The populations which are the 
Abject of statistical study always display variation in one or more respects. 
To speak of statistics as the study of variation also serves to emphasize the 
contrast between the aims of modem statisticians and those of their predecessors. 
For, until comparatively recent times, the vast majority of workers in this field 
appear to have had no other aim than to ascertain aggregate, or average, values. 
The variation itself was not an object of study, but was recognized rather as a 
troublesome circumstance which detracted from the value of the average. . . . Yet, 
from the modern point of view, the study of the causes of variation of any vari¬ 
able phenomena, from the yield of wheat to the intellect of man, should be 
begun by the examination of the variation which presents itself. The study 
of variation leads immediately to the concept of a frequency distribution. 

It is clearly important, therefore, in studying a distribution, to 
describe how the variates are clustered or scattered around an aver¬ 
age. Figure 13 shows how two distributions may even have the same 
mean and total frequency, yet differ considerably in variation from 
the mean. Such variation is commonly called dispersion, varia¬ 
bility, or spread. 

We will consider three measures of dispersion: Qmrtile Deviation^ 

^ R. A. Fisher, Statistical Methods for Research Workers^ p. 3. Oliver and Boyd, 
London. 
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Mean Deviation, and Standard Deviation, of which the last is by far 
the most important. 



Fig. 13. Two Distributions Differing in Dispersion 


2. The Quartile Deviation. Just as the median selects one point 
of division, we may now take two additional points such that they, 
together with the median, divide the whole distribution into four 
equal parts. These points are called the quartile values. 

The first quartile, denoted by Qi, is that value of x for which 
cum / = iV'/4. That is, one-fourth of all the variates in the distribu¬ 
tion are smaller in value than Qi and three-fourths of them are larger 
than Qi. The second quartile Q 2 is that value of x for which cum f 
is N/2 and is therefore the median. The third quartile, denoted 
by Qzf is that value of x for which cum f = 3iV'/4. Hence fifty per 
cent of the total frequency is included between Qi and Q 3 . 

Half of the distance between Qz and Qi is called the semi-inter¬ 
quartile range or quartile de- i 
viation and will be denoted 


by Q. Thus, 

(1) 0 = — 




✓ 



It should be noted that 
the median does not neces¬ 
sarily come at the mid-point _ 
of 2Q, i.e., that a distance 
Q laid off on either side of 
Qt would not necessarily reach to Qi and Qs. (See Figure 14.) (For 
a symmetrical distribution, to be considered later, this would 
be true.) 

As a measure of dispersion, Q gives a fairly good idea of the spread 
of the variates, and is suitable as such a measure in those cases where 
the median would be used as an average. The quartile values Qi 
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and Qs are found, like the median, by interpolation in the cumulative 
• frequency table. 

Example, (a) Find the median and the quartile deviation for the distribution 
of IQ’s in Table 6 (§10, Chapter I), (b) Illustrate the measures found in (a) by 

means of a cum f graph. 


End-x 

Cum f 

54.5 

0 

64.5 

3 

74.5 

24 

84.5 

102 

*<-01 


94.5 

284 

<r Med. 


104.5 

589 



114.5 

798 

124.5 

879 

134.5 

900 

144.5 

AT = 905 


Solution: 


iNr/4 = 226.25, N/2 = 452.5, 
Qi ~ 84.5 226.25 ~ 102 

10 ~ 284 - 102 ' 

Qt - 94.5 452.5 - 284 

10 “ 589 -284 

Qt - 104.5 678.75 - 589 

10 ■“ 798 - 589 


3iV/4 = 678.75 
Qi = 91.3 


Oa = 100.02 


0, = 108.8 


q = £l^‘=8.76. 


Figure 15 explains graphically the measures obtained by inter¬ 
polation from a cum f table. For convenience in drawing the figure, 
the quartile labels are put on vertical lines. But one should remem¬ 
ber that the quartiles are values of x and that it is the horizontal 
distances of the lines from the j/-axis that represent these measures. 
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1. Criticize the following “ definitions 

Oi = - , , 0» 


ZN 
■ - — « • 

4 


2 . Find Q\ and Qz from the cumulative frequency table which you made to 

obtain the median for the Glasgow schoolgirl distribution. (Exercise 5 
on page 52.) 

3. Find the quartile deviation Q from your results in Exercise 2. 

4« Find Qi, Qz, Qs for the distribution in Table 12, and compute Q. 

6. Compute the value of the semi-interquartile range for other distributions 
at the direction of the instructor. 



6. The mth percentile Pm of a frequency distribution is that value of the vari¬ 
able X for which cum / = mN/lQO, where w = 1, 2, • • •, 99. The 10th, 
20th, 30th, • • *, percentiles are called deciles. Therefore, the nth decile 
Dn is that value of x for which cum / = nAT/lO, where n = 1, 2, • • 9. 

Compute several percentiles and deciles of a distribution in the text. 

3. Mean Deviation. As. a measure of variation about a central 
value, it would seem appropriate to take an average of all the devia¬ 
tions about that central value. In the mean deviation (MD) about 
the mean this is precisely what we do, namely, we find the arithmetic 
mean of the numerical values of the deviations about the mean. 
In summing the deviations, Jbheir absolute values are used because 
regardless of whether deviations are positive or negative they have 
the same influence on the amount of variation. Moreover, if their 
algebraic signs are taken account of, the sum of such deviations is 
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zero (Theorem VI of Chapter III). Hence we sum them treating 
all deviations as positive. 

In mathematical symbols, vertical bars denote absolute values, 
so we have* 

(2) MD=^E/*1x<-3c|, 

if the X unit is used. When the class interval is the unit, we have 

(3) MD = ^i:/<lui-Sl 

and 

(4) MD (x unit) = c X MD (u unit). 

It can be proved that the essentially positive function 

y = 

is a minimum when A — x. (See Theorem II, page 99. Also by 
the calculus dyjdA = 0 when A = x,) It was in a similar investi¬ 
gation to find the value of B for which the function 

is a minimum, that the median was discovered. When B is the me¬ 
dian this function is a minimum. ^ This property of the median has 
some statistical importance in connection with the geographical 
location of centers of industry and population.^ Custom has estab¬ 
lished the use of the mean rather than the median in this measure. 
Hence mean deviation ” usually refers to the mean deviation from 
the mean. It is also called average deviation.^’ 

^ Since all the data are not concentrated at the midpoints of the intervals, a 
grouping error is involved here as in the formula for a (§10, Chapter IV). But 
the mean deviation is used so infrequently that discussion here of the appropriate 
correction hardly seems warranted. Those who may be interested will find a 
more precise formula in the Handbook of Mathematical Statistics — Rietz and 
others. 

* For a proof see reference 16, our Introduction. 

^ See p. 85 of Elements of Statistics — Davis and Nelson. Principia Press. 
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Example. Find the mean deviation for the grades in Table 18 where the 
mean value of x is 72.5. 


X 

f 

fz -5| 


34.5 

2 

38 

76 

44.5 

3 

28 

84 

54.5 

11 

18 

198 

64.5 

20 

8 

160 

74.5 

32 

2 

64 

84.5 

25 

12 

300 

94.5 

7 

22 

154 

Total 

100 


1036 


MD = ^ = 10.36. 
100 


What was <r for this distribution? 

The absolute value of a variable a:', denoted by the symbol [x'[, is 
not very tractable in mathematical operations. Therefore the mean 
deviation is not favored by mathematicians since it is unwieldy in 
the more theoretical and mathematical discussions. Its chief use 
is in experimental work where occasional large and erratic deviations 
are likely to occur. In such cases the standard deviation would tend 
to emphasize these deviations. 

If m of the N variates are greater than the mean, x, then the mean 
deviation may be written 

MD = ^ I (sum of variates greater than x) — mxj 

la5<>i Xi>x ) 

The student is given a hint in Exercise 34 at the end of Part I on 
how to prove a similar formula for *,■ < x. 

4. The Sta n da r d Deviationr To overcome the difficulty of nega¬ 
tive deviations and the use of absolute value signs, the deviations 
about the mean may be squared and the mean of these squares taken. 
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To get back into the original linear units, we take the positive square 
root of this result, and have 

[ 1 

- X)®J 

as defined before. The standard deviation measures the same kind 
of phenomenon as the mean deviation and this approach to it is 
frequently satisfactory to a student who otherwise finds it difficult 
to understand.^ 

For a common type of distribution, the standard deviation is 
approximately twenty-five per cent greater than the mean deviation. 
Speaking more accurately, this is true of a normal distribution (to be 
considered in Chapter VI) for which the relation is MD = f<r 
(approximately). 

It is often convenient to have a name for the square of the 
standard deviation,’’ and for this purpose the term variance ” has 
been introduced. Thus a denotes standard deviation and de¬ 
notes variance. 

Although definition (5) is the basic concept which the student 
should have for the standard deviation, nevertheless in actual prac¬ 
tice it is seldom desirable to compute (t directly from that definition. 
For a frequency distribution the method is shown in the chapter on 
moments. However, we will give an additional illustration here. 

Example, Find the mean and the standard deviation of Table 9, using 
Charlicr's check and Sheppard's correction. 

Solution: (See Table 21, p. 88.) 

Charlier's check: S/(u + 1)* = 53/^^ + 2^fu + N 

2471 = 2365 + 2 (-447) + 1000 = 2471 

Computations: 

3c = 49.5 4“ 4(-.447) = 47.712 lbs. 

= V2 — (tl^) = 2.165 


^ The term ** standard deviation " was proposed by Pearson and is now used by 
almost all English writers. As originally defined by Pearson, this is the square 
root of the mean of the squares of deviations taken from the mean of the distri- 
hutiony and is not to be used when deviations are measured from any other 
reference point. Pearson uses the term “ root-mean-square" for a similar 
measure when the deviations are taken around any origin other than the mean. — 
Walker, History of Statistical Methody p. 54. ^ 
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Umng Sheppard’s corrections, 


Corrected ms = 2.165 ~ .083 = 2.082 
ms:* = 16(2.082) = 33.312 
<r* = V 33.312 = 5.772 lbs. 


Table 21 — Weights of Glasgow School Children 


Weight {x) 

/ 

u 

fu 

fu^ 

/{« + D* 

29.5 lbs. 

1 

-5 

~ 5 

25 

16 

33.5 

14 

-4 

- 56 

224 

126 

37.5 

56 

-3 

-168 

504 

224 

41.5 

172 y 

-2 

-344 

688 

172 

45.5 

245'^ 

-1 

-245 

245 

0 

49.5 1 

263^ 

0 

0 

0 

263 

53.5 

156- 

1 

156 

156 

624 

57.5 

67 

2 

134 

268 

603 

61.5 

23 

3 

69 

207 

368 

65.5 

3 

4 

12 

48 

75 

Sums 

1000 


-447 

2365 

2471 

(Sums)/iV' 

1 



2.365 

Vt 



It will be proved later that for a certain ideal type of distribution 
which is often approximated in practical statistics the range x ± or^ 
includes about two thirds of the variates. Assuming the above 
distribution is of this type we could say that about two thirds of the 
children weighed between 42 pounds and 53.5 pounds. Such a state¬ 
ment assists one in comprehending certain characteristics of the data 
though the distribution actually may not be before him. 

It is understood that the method of computation described above 
is to be used when the clas^ marks are equispaced. If the class 
intervals are unequal we must choose c == 1 unless the x^s denoting 
the class marks have a common factor c. When c = 1, u becomes 
u = X — Xq, and the work may be simplified a little by an appropriate 
choice of xq. 




















Sec. 4 The Standard Deviation 89 

Exercises 

1. (Pearson). The following data represent the percentage of ash- 4 ;ontent in 

280 wagon tests of a certain kind of coal. 

Find the mean and the standard 

deviation of the distribution: 

Percentage 

AshrContent 

F requency 

3.0- 3.9 

1 

4.0- 4.9 

7 

5.0- 5.9 

28 

6.0- 6.9 

78 

7.0- 7.9 

84 

8.0- 8.9 

45 

9.0- 9.9 

28 

10.0-10.9 

7 

11.0-11.9 

2 

Ans. X = 7.35%, a* = 1.36%. 

2. (Camp). Find the mean wage and the standard deviation of the following 

data: 

Class 

Frequency 

$4.50- 5.99 

43 

6.00- 7.49 

99 

7.50- 8.99 

152 

9.00-10.49 

178 

10.50-11.99 

160 

12.00-13.49 

40 

13.50-14.99 

25 

15.00-16.49 

3 


Ans. AT = 700, X = $9.42, <r, = $2.19. 

3. Given cr* = 2.19 for the following (x, /) distribution, find av and o-u for the 
(y, /) and (w, /) distributions, respectively. 


/ 

43 

99 

152 

178 

160 

40 

H 

3 

X 

0 

m 

3.0 

4.5 

6.0 

S 

m 

10.5 

V 

0 

■ 

2 

3 

4 

m 

6 

7 

u 

-3 

-2 

-1 

0 

j 

1 

2 

3 

4 


What relation and theorem in Chapter IV does this illustrate? 

4 , Find the variance of Table 16 (§8, Chapter III). 

6 , Compute the value of the ratio MD/<r for the data in Exercise 1 above. 
6 . Find the moan and standard deviation for the data in Table 10. 
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7. Find the mean and standard deviation for the data in Table 11. 

8 * Transform the variates of the following distribution into standard units: 


1 X 

2 

4 

6 

8 

10 

12 

14 

16 

18 

20 

f 

B 

9 

36 

84 

126 

126 

84 

36 

9 

1 

t 

Some answers: 


1/3 

1 

5/3 

7/3 

3 


6. Relative Dispersions. The full significance of different values 
of <r can be obtained only by experience, but it is obvious that a small 
‘ standard deviation indicates that the variates are closely clustered 
about the mean; whereas a large standard deviation indicates that 
these values are spread out widely from the mean. (See Figure 13.) 

The size of variates usually influences not only the mean but also 
deviations from the mean. In other words, the magnitudes of the 
deviations from the mean seem to be dependent, in some degree, upon 
the magnitude of the mean. In comparing dispersion in distribu* 
tions, we may correct for differences in the average magnitudes of 
positive variates by taking the ratio of the standard deviation to the 
mean. Thus, the quantity 

( 6 ) V = ^ 

X 

is known as the coefficient of variation. It is obviously an abstract 
number, being independent of the units of measurement, and it is 
usually expressed as a percentage. 

The use of (6) may be misleading in situations where the origin 
from which the data are measured is somewhat arbitrary. Cases 
in point are temperature measurements and certain psychological 
data. Further discussion of such limitations of (6) will be found in 
references 2, 14, and 15, listed in the Introduction. 

6. Scaling a Distribution in Terms of cr. Suppose we lay off 
intervals of length <r on either side of the mean (Figure 16). Then 
for a certain type of distribution known as the normal curve (which 
will be considered in the next chapter) the following properties can 
be proved: 

(1) The percentage of the total frequency lying outside the range 
5 ± <r is 32% approximately. 

(2) The percentage outside 2 ± 2? is 5% approximately. 
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(3) The range x db 3<r includes practically the whole distribution, 
i.e., the total range is 6or approximately. 

The student will recognize that these ranges are, in standard units, 
t = dbl, t = =1=2, t = =t:3, respectively. These results follow from 
the relation 

a; — X 

t =-> X z + t<T. 

a 

Sometimes it is important in a statistical analysis to know how 
nearly the given variates are distributed in accordance with the 



Fig. 17 — Distribution of Table 21 Scai^d Off in Units of c 


above property of the normal curve. The distribution of Table 21 
has been scaled off in this manner, with the results shown in Table 
22. Figure 17 will be helpful in verifying them. 

We will verify here the 34.8% given in Table 22, and the student 
is asked to verify the others in, Exercise 2. The range 5 =t <r (Figure 
17) evidently includes all the variates represented by the two central 
rectangles and proportionate parts of the two adjoining rectangles. 
From 39.50,to 41.94 is 2.44, and since the variates are assumed to be 
uniformly distributed over the class interval we have 172(2.44/4) = 
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104.92 for the proportionate number to be excluded in the class 
39.5-43.5. Hence the number below 2 — a is (1 + 14 + 56 + 
104.92) = 175.92. Similarly, from 53.484 to 55.5 is 2.016, and we 
have 156(2.016/4) = 78.624 as the proportionate number excluded 
in the class 51.5-55.5. Hence the total above x + <r is (78.624 + 
67 + 23 + 3) = 171.624. So the total number outside x ± <r is 
(171.624 + 175.92) = 348 or 34.8% of the 1000 variates. This re- 


Table 22 — Results op Scaling Off Table 21 


* <= 47.712 
<r, = 5.772 

Range 

Frequency outside the 
given range 

Number 

Percent 

2 - <r = 41.940 5 -f <r = 53.484 

5 ± O’ 

348 

34.8 

2 » 36.198 2 + 2(r = 59.256 

Z rb 2(r 

60 

6.0 

2 - 3<f ® 30.396 S + 3(r = 65.028 

X db Scr 

3 

0.3 


suit could also be obtained as follows: By forming a cum f table and 
interpolating in the end x column we find 


cum / at a; = 53.484: 828 

cumf sX X = 41.940: 176 

Number in the {x db (Tx) interval: 652 

Number outside this interval: 348 


7. Semi-interquartile Range in Terms of <r. The range {Qz — Qi )/2 
when expressed in units of <r has a significance in a normal distribu¬ 
tion, as will be shown later. We will denote this by s\ hence 


8 


Qz Qi 

2(7 • 


» 


and 


Q 

— • 


For the present we merely calculate its value in the exercises below. 


Exercises 

1. Find the mean and standard deviation for the distribution of Lengths of 

Telephone Calls, given in Table 8 (Chapter I). Use Charlier’s check. 

2 . In the three distributions named, show that the percentages outside 2R + fo* for 

t » d=l, =b2, and d=3, are as stated in Table 23. Verify also the values of a. 
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Table 23 


Distribution 

N 

Percent Outside 

8 


X ± 2<r 

X 3<t 

Glasgow girls 

1000 



0.3 

0.675 

Telephone calls 

995 



0.4 

0.69 

Span 

2000 

31.8 


0.5 

0.665 


8, N Small. Ungrouped Data. When N is small it is seldom de¬ 
sirable to attempt an arrangement of the variates into a frequency 
distribution. Moreover, in this case, the values of as and are not 
usually needed because the applications of these measures relate to 
characteristics of large distributions. Therefore, only the mean and 
standard deviation are usually required for a small set of ungrouped 


Table 24 — Average Yields of Corn in Bushels per Acre 
FOR A Certain Section in Illinois from 1901-1920 


Year 

Yield (x) 

u 

w* 

1901 

21 

-15 

225 

1902 

39 

3 

9 

1903 

32 

- 4 

16 

1904 

37 

1 

1 

1905 

40 

4 

16 

1906 

36 

0 

0 

1907 

36 

0 

0 

1908 

32 

- 4 

16 

1909 

36 

0 

0 

1910 

39 

3 

9 

1911 

33 

- 3 

9 

1912 

40 

4 ^ 

16 

1913 

27 

- 9 

81 

1914 

29 

- 7 

49 

1915 

36 

0 

0 

1916 

30 

- 6 

36 

1917 

38 

2 

4 

1918 

36 

0 

0 

1919 

36 

0 

0 

1920 

35 

- 1 

1 

Totals 

A = 20 

-32 

488 
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data. The following methods will help the student become familiar 
with the several formulas for <r, which may be used in this case. 

Method I. The indirect method involving the u unit may still be 
used for finding the first and second moments. Since each variate 
is being treated separately / = 1, and we compute the values of 


Vr 


1 _ 

N 


for r = 1 and 2. 


If the values of x are unequally spaced 


we take c = 1 and let « = a: — xo which changes the origin but not 
the units. In other words, the procedure is the same as for a fre¬ 
quency distribution except that / = 1 and c = 1. 


Example. Find the mean and standard deviation for Table 24. N == 20. 
We choose zo ^ 36. 


Table 25 


X 

a;' = a; — 2 

a:'* 

21 

-13.4 

179.56 

27 

- 7.4 

54.76 

2d 

- 5.4 

29.16 

30 

- 4.4 

19.36 

32 

\ 

- 2.4 

5.76 

32 

- 2.4 

5.76 

-S' 

- 1.4‘ 

0.6 

1.96 

.36 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.0 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

37 

2.6 

6.76 

38 

. 3.6 

12.96 

39 

4.6 

21.16 

39 

4.6 

21.16 

40 

5.6 

31.36 

40 

5.6 

31.36 

688 

EN'I =73.6 

436.80 
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, CtmpiMiona: 

iri ss fe =s 


32 

20 


® = Xo “h — 36 1.6 


= 34.4 bushels 


Therefore, 




/i2 = V2 — S* = 21.84. 


oTx = <ru = V21.84 = 4.67 bushels. 


Method II. When / = 1, formula (5) becomes 

r 1 

( 7 ) = • 


and sometimes it is best to compute the standard deviation directly 
from this definition, without the use of the u unit. Thus the origin 
is placed at the mean and all indirect methods arc abandoned. If 
the mean deviation is also desired, clearly this method should be 
used. It is exemplified in Table 25 for the preceding example, and 
the variates have been arranged in order of magnitude. 


(Fx 


688 

X =-- 34r4 bushels 

20 

436.80 
20 

ffi = 4.67 bushels 


1 73 6 

MD = Vr S 1 1 = bushels. 

N 20 


Method III. From the relation 


we have 


m — Vi — (piy 


m = 


when / = 1. Therefore a may be written 

t l 1'^* 
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Table 26 


X 

a;2 

21 

441 

27 

729 

29 

841 

30 

900 

32 

1024 

32 

1024 

33 

1089 

36 

1226 

36 

1296 

36 

1296 

36 

1296 

36 

1296 

36 

1296 

36 

1296 

37 

1369 

38 

1444 

39 

1621 

39 

1521 

40 

1600 

40 

1600 

688 

24,104 


This method is perhaps the best when the values of x are not large or 
when a table of squares is available. It is illustrated below for the 
preceding example. (See Table 26.) 


Computations: 


2 = ^ ^ = 34.4 bushels 

*2 = (34.4)» = 1183.36 

^ Lx-^=1205.20 

<r* = [1205.20 - 1183.361»/» 

= (21.84)*/* 

= 4.67 bushels. 


Miscellaneous Exercises 

1. (o) Verify that the algebraic sum of the numbers in the a;' column of Table 26 
is zero. 

(6) Verify the value of mean deviation given for Table 26. 
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2 . Using your own judgment as to the most appropriate method, find the mean 
and standard deviation for each of the two sets of data, xi and Xi: 



A nstoers 

Xi 

88 

95 

68 

73 

75 

88 

57 

68 

62 

79 

73 

74 

78 

2, = 69.80 

80 

57 

65 

69 

74 

78 

72 

59 

47 

56 

67 

43 


= 12.13 

Xt 

82 

86 

75 

78 

72 

79 

63 

65 

67 

75 

68 

70 

79 

X2 = 67.64 

78 

51 

58 

65 

69 

68 

83 

80 

42 

43 

48 

47 


aa = 12.68 


3. Complete the computations and find the mean and variance of the following 
distribution: 



Hint. Here we let v = y — yo. Then 5 ~ ^ + 2/o» == since c = 1. 

(See Theorem on p. 69.) 

^ Ans. y = 87.31, oy^ = 56.66. 

4. Data have been gathered showing the points scored on a mental test by 
290 prospective employees and the per cent of standard production 
attained by these same 290 persons after being employed.^ The following 
statistics were obtained: 

Mental test: mean = 43.33 pts. 

cr = 9.25 pts. 

Productive ability: mean = 92.02% 

(7 = 24.47% 

(а) Compare the relative dispersion in mental test and productive ability. 

(б) What factors, other than mental level, may have affected dispersion 
under factory conditions? 

^ Wembridge, “Experiment and Statistics in the Selection of Employees,” 

Journal of the American Statistical Association, March 1923, p. 605. 
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5. Read and abstract the article “Variability,” Journal of Educational Research, 

vol. 4, no. 3, pp. 221-26. 

6. Find the median for Table 26. 

7. Find 2, ox, MD, and Q for the following distribution. 


mid-x 

2 

4 

6 

8 

10 

f 

1 

4 

6 

4 

1 


8. Show that (8) may be written as follows: 

-* =- N - 

/ 9, If the variates are all equal, say each Xi = k, show that 2 = fc and <r = 0. 
yo. For a set of ungrouped data it is found that N = 15, = 480, ^x^ = 

15,735. Find 2 and v*. A ^ / iT 

11. Find the variance of the following data. 


6.7 6.2 6.6 6.0 6.3 6.8 6.7 6.0 6.0 6.8 


Ana, <r»* * .064. 

12. Prove the identity: 

(xi - f )* + (a;2 - x)* + • • • + ixN - »)• 

* (xi^ + xa* + • • • + - N2\ 

13. Compute the mean deviation (from the mean) for the following data: 


X 

2 

4 

6 

8 

10 

17 

/ 

1 

6 

10 

7 

2 

2 


/ 


Ans. MD = 33/14. 

14. Verify the identity (where 2 is the mean of xi and Xa): 

{xi - 2)^ + (X2 - 2)^ « iixi - Z2)\ 
and thus show that, for two variates, 

\Xi —X2\ 


16. Verify the identity (where 2 is the mean of xi, X 2 , *#): 

3 

3(aJi aJa)® + (aJi + Xa 2x8)* = 6 2 "" 

1 
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/ 9. The Standard Deviation of the Combination of Sets, The 

following theorems involving a are interesting in themselves and 
have useful applications. 

The relation m 2 = *'2 — vi^ is true in a more general sense than we 
have previously used. Its generalized meaning wiU be revealed in 
our first theorem. 

Theorem I. The second moment about the mean equals the second 
moment about an arbitrary point P(xo, 0) minus the square of the dis¬ 
tance between the mean and P, 

Stated ifi symbols the theorem may be clearer. Suppose we have 
a set of N variates whose mean is x. Graphically, x is a point on the 
x-axis. Then if P is any other point on the x-axis, according to 
Theorem I we have 

( 9 ) = 1 - (* - *»)*' 

To prove this relation we may write 

(x — x) = (x — Xo) — (I — Xo). 

Then y v/ J 

^ Z - *o) - (* - «o)]*, 


the right member of which simplifies into the right member of (9). 

The generality of the theorem consists in extending the original 
definition of V 2 and vi so that they refer to moments about any point 
P on the x-axis (except x), and not merely about zero. Thus now, 

— Xo)*. If we take Xo = 0 we have the origiivl defini¬ 
tion of Vi. Also, when P moves 
back to zero, we see that vi be¬ 
comes X. In other words, the orig- - 
inal definitions of the v’s are merely 
the more general definitions when 
zero is the value chosen for the arbitrary point. (See (lo) of Chap¬ 
ter IV.) 

.^Theorem n. The sum of the squares of deviations of the variates 
from their mean is less than the sum of the squares of the deviations of 
the variates from any other value. Therefore <r is less than any similar 
“ Tootrmeanrsqmre." 
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The proof consists in showing that fi 2 < V 2 which is left to th^ 
student as an exercise. q 

y ^Theorem III. Let there be one set of n\ variates xu (i = 1, 2, • • 
-ni) and another set of n 2 variates X 2 i (i = 1, 2, • • •, n 2 ) and let x he the 
mean of the combined sets (Theorem VIII, Chapter III). The varir 
ance <t^ of the set formed by the combination of these two sets is given by 
the following formula: 


where 


= 'Eixu - xr + £(*« - S)* 
1 1 

N == ni + 712 . 


Proof: The proof consists in showing that 


I I 1 

i/ 

which is left as an exercise for the student. 

The above theorem is not very important in itself but it is useful 
in proving the next theorem which gives the relation between the 
variance of a composite set and the variances of sub-sets. 
f Theorem IV. Let the frequency^ mean, and standard deviation be 
denoted by ni, Xi, and oi for one set of variates and by n 2 , ^ 2 , and <T 2 for a 
second set. The variance < 7 ^ of the composite set is given by the following 
relation: 

=r fii<rY "t” n2<J^ H” nid-j^ -f- 712 ^ 2 ^ 

where V = tii 4- 712 , di = — x, ^2 = ^2 — x, and x is the mean of 

f^he composite set. 

Proof: For the ni set,^x may be regarded as an arbitrary point P. 
Hence by Theorem I we have ' ’ , 

1 ni 1 

— ~ *)* ~ (^1 ~ 
ni I , 1 "■ * 

Multipl 3 dng through by ni this becomes 

(11) ni<ri® = (xu — x)* — nidi*. 

1 

% 

Similarly for the nj group we have 

nt 

(12) n2ff2® = 2 (®*< ~ 
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Adding (11) and (12), and using (10), we obtain 

= JVff* — nidi* — mdi^. 

Hence, 

(13) iV<r* = niffi® + njffj* + nidi* + nid**. 

For k sets combined into a single set we can generalize (13) into 
the following relation: 

(14) iV<7* = + 52n,d<* 

1 1 
k 

where N = ^n< and di = Xi — x. It is interesting to observe that 

j ft 1 

— y^Hidi^ is the variance of the means of the sub-sets. Thus we have 

NT 

the important relation 
(14a) 

iV 1 

which shows that the total variance may be broken up into two parts, 
one of which is the weighted mean of the variances in the sub-sets 
and the other is the variance of their means. These two parts are 
sometimes called the average variance within classes and the variance 
between the means of the classes. They become very important in 
the “ Analysis of Variance ” (which is explained in Part II). 
Corollary I. Equation (IS) may be written in the following form: 

( 15 ) = ni(o’i* + Si*) H- 712(0-2* + S2*) — ATx*. 

Proof: Since 

nidi* = ni(*i — f)* = nifi* — (2nix^ — niX*) 

and 

n2d2* = n2(x2 — f)* = n2X2* — (2n2X2X — n2X*) 

the proof consists in showing that the sum of the terms in the end 
parentheses above reduces to Vx*. Rearranging these terms their 
sum is 

2 x(ni2i + niSi) — x*(ni -f- n 2 ), 
which by Theorem VIII (Chapter III) becomes 


2xArx — xW = ATf *. 
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Generalizing for k groups, (15) becomes 

k 

(16) ATff® = + ®<®) — Nx\ 

1 

CosoLLABT II. Equation (IS) may oZso 6e written in the form: 

(17) iV<r* = ni<ri* + njffa* + ^ (Si - Sj)®. 

The proof consist^in showing that 


' This is left as an exercise. 

For purposes of computation, (17) may be more convenient than 
hither (13) or (15) because it does not require x, but it does not lend 
itself to a generalization for k sets. Generalizations may be useful 
both for computing and for theoretical purposes. Formula (14) is par¬ 
ticularly useful in developing the theory of a later section. 

For convenience, the formulas of Theorem VIII, Chapter III, are 
repeated here: 


(18) 


TliXi + 71*1X2 

-> 

Til + 712 


Ik k 

(18a) X — ~ N ~ ^ 

/ ^ i i 

‘^Theorem V. Consider k sets. Suppose the secorid TnoTnent of each 
^ set is taken about the Tnean, x, of the combined sets. Let V 2 ^'^ represent 
this moment for the ith set. Then the variance for the combined sets 
is giv^ by 

k 

(19) Na^ = niV2^^^ + n2V2^^^ = ^niV2^*^ 


when Ui represents the frequency in the zth set and = N. 
Proof: We may write (10) in the form 


niv%^^^ + n2v^^. 


So, generalizing this form of (10) for k sets, we obtain (19). 

The next theorem gives the standard deviation of the distribution 
formed by the first iNT integers, that is, when a: = 1, 2,3 • • •, iV. It is 
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useful in cases when the variates are recorded not by measurements 
but by their respective positions when ranked in order with respect 
t^ome character or property. 

^i*4pxheorem VI. The standard demotion a of the first N naturcd num¬ 
bers is given by 


( 20 ) 



Proof: By a fundamental definition we have 


1 ^ 



and by Theorems IV and V of Chapter III, this becomes 
a* = UN + 1)(2V + 1) - UN + D* 


which reduces to 


AT* - 1 
12 


whence we obtain (20). 

10. Graphical Representation. We have shown that, if certain 
statistics are given for two sub-sets, 


Sublets 

Til 

Xi 


712 

1 

X2 



N 

X 



the corresponding statistics for the composite set may be obtained 
by means of (13) and (18a). We have been thinking of these statis¬ 
tics as relating to distributions in the a:-direction. The following 
diagrams show how the means and standard deviations of three qpob 
distributions may be represented geometrically by the points whose 
ordinates are zero and whose abscissas are, respectively, Xi, (xi ± o-i); 
^ 2 , (x* ± <rs); and x, ( 3 c ± a*)- The points are plotted on three 
different axes to avoid confusion, but they are to be thought of as 
being referred to the same origin and plotted on the s..me scale. 
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It should be clear that Theorems I-IV (§9) will apply to distribu¬ 
tions in the j/-direction as well as in the x-direction. In particular, 
it is obvious that (13) and (18a) hold if we replace x by y. Then 
the graphical representation of the means and standard deviations 


Sublets 

Composite set 

ni 


ni 

N 




y 



<rt 



is shown below. 



It will be helpful to discuss one more notion in this connection. 
Suppose the y composite set is made up of fr sub-sets and the means 
^ 1 , y*, • • *, ’Sk, of these sub-sets are plotted on the y-axis as shown 
by the labels on the left side of the axis in the figure on page 106. 

We will denote the standard deviation of these means by 0 - 5 ^. 
Then the points ’y, (y ± ff 5 ,),'and (y ± <r»), may be plotted as shown. 
We would expect less variability among the means of the sub-sets 
f.hn.n among the y’s of the composite set, that is, that would be 
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less than Vy. It is clear that (14) and (14a) hold when x is replaced 
byy. 



I 


4 

A grasp of these notions will help in the analysis of Table 27 which 
the student is asked to make in problems 5 and 6 below. 

Exercises 

1. (a) Show that J'l = ~ Site "■ = (2 — Xo), 

(h) Derive equations (9) and (13). If ni == 712 , what does (13) reduce to? 

2 . Given the following information about two sets of data: 

I II 

ni = 20 712 = 30 

, = 25 52 = 20 

<71* = 5 <72* = 4. 

Find the mean and variance of the composite set. 

3. Think of the two groups in Exercise 2 , page 97, as combined into a single 

set. 

(а) Find the mean of the combined set by formula (18). 

( б ) Find the standard deviation of the combined set using result of (a) and 
formula (13). Ans. x = 68.72, <7 = 12.45. 

4 . Using Theorem VI find the mean and standard deviation of the first 25 

natural numbers. 

6 . Consider Table 27. Observe that the first and last columns form a frequency 
distribution and that columns ( 1 ) to ( 8 ) are subdistributions whose totals 
add up to V = 260 which is also the sum of the last column. Let lu 

represent the frequency in the ith column and answer the following 

8 

questions: tii = ?, 714 = ?, Tig = ?, = ? Let y, and represent 

mean and variance in the fth column. Find the mean and variance of 
each of the columns (1) to ( 8 ), first in v units where v = ( 1 / — 85)/10. 
Check your answers with those given at the bottom of the table. 








Table 27 
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I 

s 
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iH 

00 
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B 

16 

oo 

1 
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1 



CO 

B 

O 

rH 

rH 

rH 

CO 

00 


49 

§ 


S 



iQ 

fl 

C<l 

rH 

lO 

»o 

CO 

32 

81.87 

246.48 

s. 



1 

<N 

CO 

B 


B 

14 

72.14 

191.83 




1 



(N 

C<l 


1 

67,86 

106.12 

^ 1 


CO 

C<l 



i 


CO 

1 

B 

m 

M 
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115 
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75 


lO 
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6. Using formulas (18a) and (14) find the mean y and variance cry* of the total 
distribution in Table 27 and check your answers with those given at the 
bottom of the last column. 


Hint The student will observe that the means, of the columns in 
Table 27 are the values denoted by y in Exercise 3, page 97. The weighted 
mean of these mean values is the mean of the whole table. That is, 
from (18a), 


y = 


N 


k 

Jlruyi 


1 


= 87.31. 


The answer 56.66 (Exercise 3) is the variance, erj^*, of the means of the col¬ 
umns of Table 27 and is not to be confused with the variance cry* of the whole 

table. In using (14), a* is the variance of the whole table, cr** is the vari- 

k 

ance of the zth column, and the expression JZ/iidi* equals where 

1 

cr®,.* is the variance of the means of the columns since now di = yi -- y. 

7. In Theorem V (§9) show that 

/t) 

Hence prove that (19) may be derived from (14) by showing that (14) 
may be written as follows: 

iVa» = Eni(,r<» + di^). 

1 


8. (a) Derive the following relation from (18a), 

k 


~ r Nx —1 • 

niL < = J 


What does this formula become when k = 2? 
(6) Derive the following relation from (16), 


CTl^ 




N(.a^ + - nj(<rj» + Sa») 


J - 


(9.) In a certain distribution of iV = 25 measurements it was found that x = 56 
inches and <7 = 2 inches. After these results were computed it was dis¬ 
covered that a mistake had been made in one of the measurements which 
was recorded as 64 inches. Find the mean and standard deviation if the 
incorrect variate, 64, is omitted. 

Hint Let ni = 24, n 2 = 1. Then Xz = 64 and <r 2 = 0. To find Xi and 
0*1 use formulas in Exercise 8 above. 

10# If two or more variates are deleted from a distribution for which N, 2, and <r 
are given, show how to compute the mean and variance of the remaining 
variates. 
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11. Consider a composite set consisting of k sub-sets and let and m denote, 
respectively, the variance and number of variates in the ith sub-set. 


and N = 

1 

(o) If the sub-sets have equal means, show that the variance of the com¬ 
posite set is given by 



(6) If the sub-sets each contain the same number of variates and have equal 
means, show that 



CHAPTER VI 


TYPES OF DISTRIBUTIONS. THE NORMAL CURVE 

1. Skewness and Kurtosis. The shapes of frequency distributions 
are not all alike. Unimodal distributions may differ in two ways 
with respect to form. These differences can be described more easily 
if we think in terms of frequency curves. The curve may be quite 




symmetrical, or it may be skew, bulging out on one side more than 
on the other. Secondly, the top of the curve may be narrow and 
peaked, or it may be somewhat flat giving a mound-shape effect. 

The mean and standard deviation are not sufficient to detect these 
characteristics, so we need other measures to describe them. Con- 

109 
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^ Table 28 



A 

B 

C 

u 

f 

s 

/ 

-3 

0 

1 

0 

-2 

3 

1 

1 

-1 

6 

5 

10 

0 

7 

11 

6 

1 

6 

5 

5 

2 

3 

1 

2 

3 

0 

1 

1 

Sums 

25 

25 

25 


eider, for example, the three distributions of the weights (in class 
units)^ of different breeds of mice 120-130 days old given in Table 28. 
Experiments on mice are important in cancer research. These dis¬ 
tributions are, however, some¬ 
what fictitious, being adapted 
from some actual data for pur¬ 
poses of illustration. 

The student may easily verify 
that for each of these distribu- ~ 
tions we find the same mean and 
standard deviation, namely, ^ 


-2 


iZ = 0, 


= 1 . 2 . 


-3 -2 -1 


H=L 


One may see from their his¬ 
tograms that these distributions 
are essentially different in shape 
even though they all have the 
same mean and standard devia¬ 
tion. These differences would 
be more pronounced if N were 
so large that the shapes ap¬ 
proached a regular and smooth 
form. Such a large value is 
called the population or universe ” and the value of N that 
we usually have at hand is a sample.^' 

^ Neither the original units nor the class interval need concern us here. 


-2 -1 


0 1 
Fig. 20 
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Lack of S 3 rrmnetry in a distribution is known as “skewness.” 
This characteristic is measured by as. If a distribution is sjunmetrical 
as = 0, but as may be positive or negative depending upon whether 
the long tail of the distribution extends to the right or the left of tiie 
mean. (See Figure 18.) 

Figure 19 exhibits curves with different degrees of flatness or 
peakedness. The flatness that we are now describing is in the 
neighborhood of the mode and is not to be confused with the flat¬ 
ness of a curve as a whole which is due to spread or dispersion. 
The curves in Figure 19 all have the same spread. So their flatness 
depends upon the relative amount of material in the vicinity of the 
mode. This characteristic of a curve is called “ kurtosis ” and is 
measured by ou. By the calculus it can be demonstrated that 04 = 3 
for a certain type of distribution which is called the normal curve. 
A frequency curve is said to have positive kurtosis if 04 > 3 and 
negative kurtosis if 04 < 3. It seems, however, that any combina¬ 
tion of kurtosis and peakedness may occur.* The values of as and 
04 computed for an observed distribution are useful in selecting the 
curve which will best represent the type to which that distribution 
belongs. 

Both 03 and 04 are abstract numbers and therefore skewness and 
kurtosis in different distributions may be compared by these meas¬ 
ures. Therefore our definitions are 

K f 03 is a measure of skewness, 

' ^ [ 04 is a measure of kurtosis. 

For an unsymmetrical distribution the distance between the mean 
and mode may be used to measure the degree of asymmetry or skew¬ 
ness, because the mean and mode coincide in a symmetrical distribu¬ 
tion. Since we wish any measure of skewness to be a pure number, 
we would express this distance in units of the standard deviation, 
thus (mean — mode)/a. Now it happens that there is a certain 
curve known as Pearson’s Type III which is used to represent certain 

*A Common Error Concerning Kurtosis—I. Kaplansky. J. Amer. StaL 
Assoc., vol. 40, p. 259, June 1945. In this connection, Professor I. W. Burr 
comments: “ The shape of the hump of the curve has less influence on 04 than 
does the length of the tails. In Figure 19, the curve with 04 = 4.6 should have 
the longest tails.” 
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skew distributions, and it can be shown by higher mathematics that, 
for this curve, 

, . mean — mode as 

® -;- 2 - 

So this relation* may be used as a formula for obtaimng the approxi¬ 
mate mode. 

Exercise 

Find ett and 04 for each of the distributions A, B, and C, in Table 28. 

2. Frequency Curves. As the student extends his experience he 
finds several types of distributions. It is important in certain prob¬ 
lems to differentiate between them. Differences in type lead to the 
study of frequency curves. There are several standard curves to 
represent the different types of distributions that arise in practical 
statistics.* Each of these is specified by a mathematical function 
y = f(x) where/(x) is a general symbol for any function of x. It is, 
of course, a different expression for each of the different curves. 


y 



o] a b T 2 

Fig. 21 


Such functions are also called distribution functions. A complete 
discussion of this subject belongs to the field of advanced statistics. 
However, there are some simple concepts relating to frequency 
curves which will be useful in our work. 

If a frequency curve is used to represent a given distribution, the 
total area under the curve corresponds to the total frequency N, 

9 

^Because of this relation some writers use as/2 as a measure of skewness 
instead of at. Also some authors adopt a different convention as to sign, defining 
skewness as negative when the mean is greater than the mode. 

>See Chapter III, Part II. 
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and therefore the partial area under the curve between the ordinates 
erected at a: = a and x = b (Figure 21) represents the number of 
variates with measurement or character between a and b. The limits 
between which the theoretical distribution ranges are denoted by 
h and k- It is often convenient and causes no loss of generality to 
suppose that the total area under the curve is unity or 100%, in 
which case the partial area between a and b represents the percentage 
of variates haAung the given character. 

In mathematical language the “ area under/(x) between o and b ’’ 
is called the “ integral of /(x) from a to b” and is denoted by the 
symbol 



dx. 


However, we will abbreviate this symbol and use merely J' to de¬ 
note such an area. 

Without attempting to be rigorous, we may say that the total area 
under the curve is the limit of the area of the appropriate histogram 
whose rectangles have bases Ax and altitudes /(x), as Ax is taken 
smaller and smaller and approaches zero. Thus 


f /(x) dx — lim ^fix) Ax. 

The integral sign J* is a conventionalized S and denotes the sum 


of elements of area with bases dx and altitudes y = f{x). The letters 
written at the top and bottom of denote the range over which 

/ nb 

y dx or / f{x) dx 

•/a 


represents the area which is bounded by the curve y = /(x), the 
ordinates at x = o and x = 6, and the x-axis. (Figure 21.) 

The integral of y = /(x) from h to h denotes the total frequency N. 
Therefore, 


N = 



Hence, the proportion of variates having some character x, such that 


a < X < 6, is given 



If N is taken as unity or 100%, then 
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J* denotes the percentage of variates having the ^ven character. 

The integral represented by this sjrmbol also denotes the probability 
that a variate chosen at random from the universe y = /(») will 
have a value between a and b. 

3. The Normal Curve. Perhaps the most important of all fre¬ 
quency curves is the so-called normal ‘ curve whose equation may be 
written 

(3) y = 


where K, A®, and m represent numbers whose significance will be 
explained presently. The curve is bell-shaped and is symmetrical 
about the line x = m. It was first discovered by a famous French 
mathematician, De Moivre, over two hundred years ago and pub¬ 
lished in 1733. He obtained it while working on certain problems 
in games of chance which were proposed to him by the gamblers of 
his day. Because of this origin and because the data from certain 
coin- and dice-throwing experiments closely approach it in form, it 
is often called the normal probability curve. Actual statistical use 
of the normal curve began with the work of the famous mathematical 
astronomers, Laplace (1749-1827) and Gauss (1777-1855), each of 
whom derived it independently and presumably without knowing of 
De Moivre’s treatment.® They found that it represented very well 
the errors of observation in the physic^ sciences. (For this reason 
it has been called the normal curve of error, jwhere error is used in 
the sense of a deviation from the true value. Since that time experi¬ 
ence has shown that it serves quite well to describe many of the dis¬ 
tributions which arise in the fields of biology, education, and sociology. 
Much of the theory of statistics is built around it. 

The calculus is required to define the moments of a theoretical 
distribution specified by a frequency curve y = /(x). (These defi¬ 
nitions are given in Part II.) • It turns out that the mean of (he dis^ 
tribidion specified by (3) is m and its variance is l/{2h^). The 
constant K is determined so that the area under the curve shall have 
some relevant value. In describing an observed distribution by 

^ The term “ normal" used here should not be interpreted to mean that other 
types of distribution are abnormal. 

* For a more extensive history see (o) ‘ ‘Bi-oentenary of the Normal Curve, ” 
Jour. Amer, StatMical Assoc., vol. 29 (1934), pp. 72-76. (6) "Mathematical 

Statistics’’ (Carus Monograph) — Rietz, Ch. 3. 
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means of a normal curve, we wish to have the number of area units 
under the curve (3) equal to the number N of observed variates. 
When this condition is imposed, K = Nh/^ and we see that K 
depends also on h. If we adopt the same notation^ here as we used 
for an observed distribution, we have 


m = X, 





N 


Upon making these replacements, (3) becomes 


(3a) 




N 




4. standard Form. The letters tt and e represent numbers which 
always have the same values (see §1, Chapter I). But each of the 
letters m, A, and K may take on different values in different situa¬ 
tions. Such constants arc called parameters, and (3) really rep¬ 
resents a family of curves. Similarly, in (3a), x, a, and N are 
parameters. For assigned values they determine, respectively, the 
position of the curve along the x-axis, its steepness, and its size^^ 
but they do not have anything to do with its fundamental charac¬ 
teristics (i.e., those properties which differentiate it from all other 
curves). In order to study these characteristic properties it is 
convenient to represent the curve by an equation which will be in¬ 
dependent of the parameters; in other words, to eliminate them from 
the equation by a transformation. This is accomplished by con¬ 
sidering the total area under the curve as unity, taking the origin at 
the mean, and using the standard deviation as the unit of horizontal 
measurement. In mathematical language tliis means that we set 
N = I, and t = {x — x)/^*. We will denote the resulting function 
by 0(O> that is, 

^( 4 ) 



which is called the standard form of the normal curve. 

A variablej t, which is distributed in accord with (4) is said to be 
normally distributed with mean zero and unit standard deviation. 

Just as coordinates of points on the curve are denoted by (x, y) 

* In the theory of sampling, Part II, it is necessary to distinguish the moments 
of a sample from those of the parent universe by the use of different symbols. 
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in the case of equation (3a), so in equation (4) t refers to abscissas and 
0(0 refers to ordinates. The relation between the two systems of 
codrdinates is given by 

( 6 ) 

for abscissas, and 

( 6 ) 

for ordinates. Equation (6) follows from (3a) and (4). If the area 

under the curve is taken as unity, then y = - 0(0, that is, 0(0 = oy. 

a 

This says that since the abscissas are compressed by a in changing 
from arbitrary units into standard units, so the ordinates must be 
stretched by <r if the area under the curve is to be the same in the two 
scales of measurement. 

6 . Tables of Standard Ordinates and Areas. One of the reasons 
for writing the equation in standard form is that the ordinates and 



areas may be tabulated once and for all. These tables are given in 
the Appendix. We see from (4) that 0(—0 = 0(+O) t-c-i the ordi¬ 
nates for negative values of t are the same as for the corresponding 
positive values of t, and the curve is S3mmietrical about the ordinate 
at f = 0. Therefore it is necessary to tabulate values of 0(0 for 
positive i’s only. Equation (4) may be graphed by plotting the 
points corresponding to a few well chosen values from the tables 
and drawing a smooth curve through them. (Figure 22.) 

The curve approaches very close to the horizontal axis at each 
extremity but is asymptotic, that is, it does not quite touch the axis 


x = iv + X 


" f 


y = -<l>(0 

C 
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no matter how far extended. We say its limits are at — <» and 
+ 00 . Although the infinite abscissal range is never met in practice 
it may be characteristic of the “ universe ” from which a given 
distribution is a sample. Therefore, this infinite feature is useful in 
theoretical investigations. Moreover, even in representing observed 
distributions the infinite range causes no practical difficulty because 
the curve comes down to the horizontal axis very rapidly beyond 
t — ±3. The combined area at each extremity beyond t = ±3 is 
only .27 of 1% of the total area under the curve. 

Partial areas between ordinates erected at various values of t, say 

between t = a and < = 6, are denoted by J* . Thus the area from 



<=sOtoi=lis given J'^ ~ 
Since the total area under is 


.3413. (See Table I, Appendix.) 
taken as unity the area on either 


side of < = 0 is 0.5 and it is only necessary to tabulate the areas J' 

for positive values of t. Thus the area from t = —lto< = 0is equal 
to the area from < = 0 to t = 1. In symbols this would be stated 
as follows: 



Any other areas required may be found by an appropriate addition 
or subtraction of tabular values. For example, suppose the area 

b^w t= —2 is required. This is denoted by f . Now the area 

from — 00 to —2 equals 0.5 minus the area from —2 to 0. And the 
area from —2 to 0 is the same as from 0 to 2. That is. 




r = r =. 4772 . 

J -2 Jo 

.4772 = .0228. 
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Both areas and ordinates for decimal values of t between tenths may 
be approximated by interpolating between the values given in the 
tables. 

The illustrative examples following §6 will help the student 
become familiar with the tables. He should verify the answers and 
draw a simple sketch of the curve showing the ordinates or areas in 
each case. 


The symbol / denotes a cumulative relative frequency, i.e. 
the percentage of the total frequency N which is less than t. In order 
to find values of / from the tables, for assigned values of t, the 
student should observe (from a figure) that 



the plus or minus sign to be used according as t is positive or negative. 

6. Properties. A knowledge of the properties of the normal curve 
is essential for an intelligent use of the curve in practical statistics. 
A demonstration of some of these properties is beyond the scope of 
the present discussion although quite simple in the calculus. The 
following properties are the most important and interesting. 

1. The mean, median, and mode coincide at ^ = 0. The height 
of the maximum ordinate in standard form is l/V^ because when 
< = 0, <^(f) = 1/V^ = .3989. 

2. Since the standard deviation is the unit of measurement along 
the horizontal axis, = 1 in the t scale. Any t value may be con¬ 
verted into the corresponding x value by (5). In the vertical direc¬ 
tion N/ff is the unit of measurement and any 0(<) ordinate may be 
converted into y units by means of (6). 

The area under (3) in the range from * = ctoa; = dis denoted by 



If t = a and t = h denote the corresponding range in standard units, 
then 


(7) 



dt. 


denotes the corresponding area, in standard units, under (4). It is 
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shown in the calculus that dx — a^dt. Therefore from (6) we have 
(8) f ydx = N f 4>{t) dL 

Uc Ua 


If the interval goes from a; = c to a; = d, (8) says that 


(9) 

where 


Frequenq/ over (c, d) 



( 10 ) 


a = (c — S)/®-*, b = (d — x)/a*. 


This merely means that the percentages (relative frequencies) ob¬ 
tained from the tables may be converted into numbers (frequencies) 
by multiplying the percentages by JV. 

3. The curve changes from concave to convex at i = ±1. In the 
ar-scale, referred to the origin of x, these points are at x = x ± a*. 
They are called points of inflection and thdr position is important 
in making an accurate drawing of the curve. 

4. The standard deviation is approximately 25% greater than 

the mean deviation. More precisely, MD = = .798(7. “ 

1.2533.^ 

5. The quartiles, Qi and Qs, are equidistant from 4 = 0 and there¬ 
fore from the mean. By definition 

Qs is that value of t for which I = .75, 

t/ — oo 

t.c., for which J' — .25. From the tables this[is t = .6745. There¬ 
fore in arbitrary units, 

^3 = X + .6745<ri and Qi = — .6746o’x. 

6. The quartile deviation (semi-interquartile range) for a normal 
distribution will be denoted by E. Its value is 


E = 


- Qa - Qi _ (X + .6746<r) - (f - .674S<r) 


.6745(r. 


In standard units this is s = ®/<t = .6745. 

7. The quantity E (or s) has a significance in probability theory. 
If a variable x is distributed according to the normal curve, the 
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probability is one half that a variate selected at random will have a 
value between E — E and x + E. The reason for this statement is 
that 50% of the variates have values within this range. E is com¬ 
monly, though somewhat ambiguously, called “ probable error.” 

8. E is in units of x whereas s is a value of t, that is, s is the value 
t = .6745, and E is the value x = .6745<r*. Just as <7» may be used 
as a yardstick in scaling off a distribution on either side of the mean 
(§6, Chapter V), so may E or she used in a similar manner. When 
thinking of them in this way it is useful to regard £ as a yardstick 
about two-thirds the length of o-,. The following table gives the 
end-points of certain intervals in t, x', and x units, respectively, where 
t = x'/ffx and x' = X — X. 


End Points of Certain Intervals in t, af, x 


When a is the unit 

When E ie the unit 

t 

o' 

X 

t 

0/ 

X 

0 

0 

s 

♦0 

0 

S 

zkl 


X =b cr 

=t .6745 

± .6745(r 

S± .6746(7 


±2<r 

5 ± 2<r 

±1.349 

±1.349<r 

2 ± 1.349(7 

dzB 

±3(r 

X zt 8(r 

±2.023 

±2.023<r 

2 ± 2.023(7 


The percentage distribution of area under the normal curve is 
given (approximately) in Figure 23 where o-, is the unit of measure¬ 
ment along the horizontal axes and in Figure 24 where s is the unit. 
The percentages given in the figures may be regarded as abridged 
tables. Of course the tables in the Appendix will ordinarily be used 
in problems. 

Wth reference to Figure 23, it is sometimes said that if values of x 
are normally distributed, the probability that a value chosen at 
random will fall within the range xi < x < Xj, where Xi = x — or, 
and xj = S + <r„ is .68. 

9. Astronomers and physicists have called h the “modulus of 
precision.” From the relation h — l/('\/2tr), it is evident that h 
increases as <r decreases. And as h increases, the curve (with N and 
m kept constant) becomes narrower in the neighborhood of m and 
in this sense h measures the closeness of the values of x to their mean. 
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•3cr - 2 a -a o a 2a 3a -4s -3s-2s -s o s 2s 3s 4s 
Fio. 23 Fig. 24 


10. The curve is symmetrical and as = 0. The fourth moment 
about the mean is equal to three times the square of the second 
moment about the mean, f.e.,iU4 = 3/12^ and therefore a4 = ^ 4 /^ 2 ^ = 3. 


Examples 

1, Find the ordinates of </>(0 for (a) t = 2.3, (b) t = —2.3, (c) t = .67. 

Solutions from the tables in the Appendix: 

(a) = .02833 

(b) 0(-2.3) = .02833 

(c) <»(.67) = .31874 

2. Find the following areas under and use the integral notation: 

(а) From t ^ 0 to < = 3.00 

(б) From i = 1.5 to ^ = 2.5 

(c) From t = —2 to t = 1.3 

(d) From t = 0 to ^ = 0.6745 


Solutions from the tables: 

(a) The required area is given by I which we find to be .49865. 

Jo 

X 1.5 

= .43319, and from i = 0 to 

J p2.5 p2.5 p2,5 

f = .49379. Therefore the required area is / = / 

0 J 1.5 Jo 

X 1.5 

= .0606. 

(c) Since the area from < = 0 to < = —2 is the same as from < « 0 to < * 
+2 we have 

X 1.3 p2 pi.3 

^ I + I = .47725 + .40320 = .88045. 

2 Jo Jo 
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(d) Here we must interpolate: 


For 

t = .67, 

= .24857 

For 

t = .6745, 

11 

f 

For 

Therefore 

t = .68, 

= .25175. 


A - 

.24857 .0045 


.25175 

- .24857 ~ .01 

whence 




$• Show that for equation (3), the percentages of area outside the given ranges 
^ are as stated below: 

Above 5 + a = 15.87% 

Outside 2 dr o- = 31.74% 

Outside 5 ± 2<r = 4.56% 

Outside X d: 3<r = 0.27% 

SolvUon: Converting these ranges into t units, and remembering that only 
the positive half of the area under is tabulated and equals .5, we have 


Area 

above t = 

1 is 

5- r 

= .1587 




Jo 

= 15.87% 

Area 

outside t = 

=hl is 

2(15.87%) 

= 31.74% 

Area 

outside t = 

d:2 is 


I = .0456 
= 4.56% 

Area 

outside t =s 

d:3 is 


) = .0027 
= 0.27% 


4. Given N = 1500, 2 = 75, <r* = 10. If the variates are distributed according 
to the normal curve, (o) find the value of x for which cumf = 800, (6) for 
which cum f = 450, (c) how many of the N variates lie where x < 80? 
Solutions: 

(a) By definition, cumf = 



and from (8), 
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X'.-'+X 

= .0333 


whence from the tables, 
Substituting in equation (5), 


t = .083. 


X = 75.83. 


(6) We have 


we have 


/ = 45/150 = .3 and t is negative. 

00 

L—jT 


whence we find that 


t = -.524 


a: = 69.76. 

(c) From the relation < = (x — 2)/(r* we find that 


From the tables, 


t = .5 when x = 80. 


= .69146. 


From (8) we have 


£- 


1500 (.69146) 


= 1037.2. 


Exercises 


1. Find <^(2.65), <^(-1.46), </»(0). 

2. Find t if = .1257, .0325, ,0034, respectively. 

3. Find the following areas under </>(0, and draw a figure in each case: 


(a) f"y f 7 f 

Jo J- 1.2 J--o> Jl ,2 t/-l 

/ .37 /'.6745 

^ / 

.37 «/-.6745 


4 . Find given the partial areas: 


2 r = .6. r = .27457, T = . 
i/o Jo J-t 


999730. 
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6. Verify the percentages given in Figures 23 and 24. 

6. (a) How far from the median of a normal distribution is the first quartile? 

(5) In a certain normal distribution S » 89 and Qi » 75.61. What is ^«7 

?• For a normal distribution: N »= 1000, S » 20, » 2. 

(a) What is E? 

(6) Find the value of Qt, 

(c) What values of x will include the middle 600? 

(d) The middle 75%? 

8* If i\r » 300, 2 = 76, (Tx » 16, for a normal distribution: 

(а) What is the value of the first quartile? 

(б) The third quartile? 

(c) How many variates are between a; = 60 and x = 90? 

9* In a college the 8 grades A, A—; B, B —; C, C—; D, and F are given. 
On the assumption that mathematical ability is normally distributed, 
how many out of a total of 1000 should receive each grade? Assume 
that 2 is the boundary between the C and B — grades and that each grade 
interval is .8<r. What range in standard units on either side of 2 is thereby 
assumed to include all the grades? 

10 . What are the percentages of a normal distribution outside 2 db <(r for 
< « 1, 2, 3? 


7. Curve Fitting. It should be remembered that a set of data 
coUected and presented in the form of a frequency distribution is 
merely a sample of a general type called its universe. Other samples 
from that universe might yield somewhat different frequency distri¬ 
butions. 

For certain purposes it may be desirable to fit a normal curve to 
a unimodal distribution which is reasonably symmetrical and appears 
to be of the normal type. The theoretical curve idealizes the recal¬ 
citrant observational data and smooths out the irregularities due to 
sampling fluctuations. k 

“ In fitting equation (3a) to a giv® distribution, we assume that 
I (I) The given frequency N represented by a histogram equals the area 
under the curve, and 

^ (S) The mean and standard deviation of the observed distribution 
equal, respectively, the mean and standard deviation of the theoretical 
distrihition represented by the curve. 

A normal curve is a mathematical model ot a hypothetical uni¬ 
verse. In identif 3 dng such a universe with (3a) only its form is 
specified by the model. The parameters are (usually) unknown. 
^ estimate of a parameter by the use of an appropriate function of 
the observed data is called a statistic. Assumption (2) above meansi 
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then, that we replace each of the parameters by the corresponding 
statistic.^ 

The procedure of fitting a normal curve to an observed distribu¬ 
tion will now be illustrated with the data of Table 21, p. 88. We 
substitute 

f = 47.712 
= 5.772 
N = 1000 


in equation (3), and obtain 


y = 


1000 

6.772 


(g-47.712)« 
2(5.772)« • 


To make use of a table of standard ordinates in graphing this 
equation we transform it into standard units by setting 

X - 47.712 

(o) t = — - ■■■ = .173251 - 8.2661 

^ ’ 5.772 

and write 

(6) y = -^(0 = 173.250(0. 


Appropriate values to assign x in equation (o) are the end-a; and 
mid-x values of the given distribution. The use of a computing 
machine in changing x values into corresponding i values is explained 
in §6, Chapter IV. Thus we obtain the values in the second col¬ 
umn of Table 29. We may then enter the table in the Appendix 
for the corresponding ordinates, 0(0- These are converted into y 
values by equation (6). The curve may then be drawn by plotting 


} It is shown in Part II that a better estimate of the variance in the umverse is 
obtained by multiplying the variance of the observed distribution by iV/ — 1). 
Because of this fact some writers, denoting this result by s*, define the variance of 
an observed distribution by 




35 )* 


The distinction between the two definitions is not an important one, in the au¬ 
thor’s opinion, for beginning students who are learning the descriptive method¬ 
ology of statistics. And in curve fitting, the numerical difference is negligible 
because N is fairly large. The distinction is important, however, in the theory 
of small samples (Part II}, 
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Tabu) 29. i = .17325a; - 8.2661, y » 173.250(0 


X 

t 

0(0 

V 

f/c 

27.6 

-3.502 

.00086 

0.15 


29.5 

-3.155 

.00275 

0.48 

0.25 

31.5 

-2.809 

.00772 

1.34 


33.5 

-2.462 

.01927 

3.34 

3.50 

35.5 

-2.116 

.04253 

7.37 


37.6 

-1.769 

.08344 

14.46 

14.00 

39.5 

-1.423 

.14494 

25.11 


41.5 

-1.076 

.22361 

38.74 

43.00 

43.5 

-0.730 

.30563 

52.95 


45.5 

-0.383 

.37072 

64.23 

61.25 

47.5 

-0.037 

.39866 

69.07 


49.5 

0.310 

.38023 

65.87 

65.75 

51.5 

0.656 

.32230 

55.84 


53.5 

1.003 

.24124 

41.79 

39.00 

55.5 

1.349 

.16060 

27.82 


57.5 

1.696 

.09469 

16.41 

16.75 

59.5 

2.042 


8.59 


61.5 

2.389 

.02299 

3.98 

5.75 

63.5 

2.735 

.00948 

1.64 


65.5 

3.082 

.00346 

0.60 

0.75 

67.6 

3.428 

.00111 

0.19 
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t-O 

Fig. 25 — Normal Curve Fitted to Histogram Representing Weight 
Distribution of Glasgow Schoolgirls (Table 21) 

The smooth curve is plotted from the points (x, y) given in Table 29. The 
column headed f/c in that table gives the heights of the rectangles in the histo- 
^am, c = 4. When both the curve and the histogram are to be drawn, it is best 
to draw the curve first so that the presence of the histogram will not prejudice 
one into trying to make the curve ht the histogram. 
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the X and y values. (Figure 25.) The curve should be drawn so 
as to be symmetrical with respect to the ordinate at the mean and 
its points of inflection should be at a distance from the mean equal 
to <r. The student should observe that every 
pair of (x, y) values computed in Table 29 
furnishes two points for the graph, each sym¬ 
metrical to the other with respect to the mean 
ordinate. Both points should be used in 
drawing the curve but only the computed 
points should be left permanently in the graph. 

After the curve is drawn, the histogram for the observed data may 
be constructed. The column headed f/c gives the heights of the 
rectangles on the same scale as the ordinates of the curve. 

8 . Graduation. The areas under the fitted curve and over the 
class intervals are called theoretical frequencies. Thus in Figure 25 
the shaded area represents the theoretical frequency corresponding 
to the observed frequency which is represented by the rectangle the 
mid-point of whose base is 41.5 pounds. The determination of the 
theoretical frequencies is called graduation by the normal curve.'^ 
It is a process of smoothing out the data to fit the curve. The method 
is shown in Table 30 for the data represented by Figure 25. 

In order to enter a table of standard areas we must change the 
end-x values into t values. These are given in the third column of 
Table 30. They are part of the values already computed for Table 29. 

The entries in the column headed A = / are the (cum f)/N 

t/ — » 

values of the standard curve for the given end-points. The entries in 
the column headed AA are obtained by differencing the preceding 
column. (See last paragraph of §9, Chapter I.) They are the per¬ 
centages p == f/N = AA to be expected in the various intervals on 
the hypothesis of a normal distribution. Therefore JVAA gives the 
numbers to be expected, that js, the theoretical frequencies. 

The student should study this table until he becomes familiar with 
all the operations involved and what they mean. He should distin¬ 
guish between the purposes of Tables 29 and 30. 

9. Purpose of a Graduation. If, for the distribution of graduated 
frequencies, the mean, standard deviation, and total frequency are 
found, their values will be precisely those of the corresponding mo¬ 
ments in the observed frequency distribution. This must be so, 
because these were the conditions imposed in the process of gradu- 




fsa arM 

C SB bMS9 

f/c « h^iQhi 




Tabi® 30 


Observed 

Frequency 

Boundary 

X 

t 


AA 

naa = 

Theoretical 

Frequency 


— 00 

— 00 

.0000 



1 




.0025 

2.5 


31.5 

-2.809 

.0025 



14 




.0147 

14.7 


35.5 

-2.116 

.0172 



56 


1 


.0602 

60.2 


39.5 

-1.423 

.0774 



172 




.1553 

155.3 


43.5 

-0.730 

.2327 



245 




.2527 

252.7 

! 


47.5 

-0.037 

.4854 



263 

■ 


1 

.2587 

258.7 


■ 

0.656 




156 




.1674 

167.4 


55.5 


.9115 



67 




.0679 

67.9 


59.5 

2.042 

.9794 



23 




.0175 

17.5 


63.5 

2.735 

1 

.9969 



3 




.0031 

3.1 


00 

00 

1.000 



Totals 

i 




1.0000 

1000.0 
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ation. Moreover, the observed values of skewness and kurtosis as 
given by as and as will not differ appreciably from the theoretical 
values if the fitting of the normal curve to the observed distribution 
was justified. 

Since the above parameters characterize a distribution, the ob¬ 
serving student may wonder why a distribution should be graduated 
if the values of these constants are unaltered in the process. 

There are three main reasons why a student should be taught to graduate a 
curve. The first, and least important, has to do with the use of a smooth curve 
in place of a jagged sample. The second, and most important, is that it is 
necessary for the mathematical development of statistics that the mathematician 
should be told what assumptions he may make. These usually depend on the 
types of frequency curves which can be depended on to fit phenomena.... 
A third reason, intermediate in importance between the other two, is that in 
testing a priori theories in various fields, it is often necessary to test the efficacy 
of the frequency distributions which are results of these theories.' 

The second and third of the above reasons may seem somewhat 
abstruse, but it is not easy to give completely satisfactory explana¬ 
tions of them at this level of exposition. About all we can say at 
this time is that the distribution of variation of a variable x about its 
mean value is a fundamental statistical concept and in certain theo¬ 
retical investigations it is very important that we have mathemati¬ 
cal functions which are capable of representing such distributions. 
This is particularly true in sampling theory which will be discussed 
in Part II. 

The first reason is more readily understood. Occasionally in 
practical problems it may be desirable to use the theoretical fre¬ 
quencies obtained by graduation in place of the observed data which 
probably contain irregularities due in part to grouping, in part to 
sampling fluctuations. We cite here two illustrations. 

Example 1. A company which operates a chain of men’s haberdashery stores 
planned to bring out a new line of about 100,000 light weight sport shirts suitable 
for camping, hunting, etc. The question arose as to the determination of the 
number of each size that should be ordered from the factory. Their previous 
distribution of sizes had not been satisfactory because the demand for certain 
sizes had been different from the number manufactured. Therefore the statistical 
department was requested to recommend the distribution of the proposed order 
according to neck sizes. The solution of the problem hinged upon the availa¬ 
bility of data giving the measurements of neck circumferences of a large sample 
of men. Satisfactory data were found in the Reports of the Medical Depart¬ 
ment of the United States Army in the World War,” which gave a table of the 

^ Journal of the American Statiatical Assoc., vol. XXVI, March 1931, Supple¬ 
ment, p. 36. 
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neck measurements in centimeters of 95,102 white troops at demobilization. 
Since these data are tabulated in class intervals which are slightly different from 
the ranges used in standard shirt-band sizes, a slight adjustment was necessary. 
But essentially a normal curve was fitted to this distribution and the graduated 
frequencies were taken as the number of potential customers for each shirt size. 
The result was quite satisfactory. 

Example 2. A well known and interesting illustration of the desirability of 
smoothing occurs in the census returns. The census takers' records show more 
persons alive at age 30 than at age 29, more at age 35 than at age 34, more at 40 
than at 39, etc. This is probably due to the fact that men (as well as women) 
do not tell their exact ages. A person who is actually 41 or 42 and known to be 
40 or so, says he is 40. The recorded data show artificial bumps at every age 
which is a multiple of 5. Naturally the Census Bureau prefers the smoothed 
results to be observed. The student should not infer that the curve used to 
smooth these data is the normal type. The “life curve" is a continuously de¬ 
creasing function. However, the same kind of quinquennial irregularity occurs 
in other actuarial data which do approximate the form of a normal curve. Many 
examples are given in Elderton, Frequency Curves and Correlation, 

10 . ProbabUity. A frequency curve is sometimes called a proba¬ 
bility curve. The link connecting frequencies with probabilities 
has its starting point in the following definition: 

Definition. If out of N mutually exclusive and equally likely 
events, f are distinguished by some 'property A, the probability of an 
event bearing the property A is f/N, 

The definition implies that probability is measured by a number in 
the range 0 to 1, the lower limit denoting impossibility and the upper 
limit denoting certainty. 

Since the total area under the curve represented by (4) is unity, 
any partial area denoted by (7) can be interpreted as the prob¬ 
ability that a value of t selected at random from a normal distribu¬ 
tion (4) lies between t = a and t = b. 

Example 1. Refer to the data of Table 8, Chapter I. Let us assume that a 
normal curve was fitted to this distribution and that the fit seemed (by visual 
inspection) to be reasonably good. Generalizing on the experience shown in the 
table, the telephone company wishes to estimate the probability that a call (of 
the same type of message as that in the table) will be between (say) 500 seconds 
and 600 seconds in length. 

Solution. Using (10), 


a - (500 - 477.3)/148.5 = 0.15, 

6 = (600 - 477.3) /148.5 = 0.83. 

Under the implicit assumptions, the required probability is P 



= 0.24. 
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Example 2. Referring to Example 1 above, find the probability that the length 
of a telephone call will differ numerically from the mean of the table by as much 
as 5 minutes. 

Solution, We find M I = 300/148.5 ~ 2.02. The probability of a deviation 


not greater, numerically, than 300 seconds is P = 2 


j*2.02 


0.96, approxi¬ 


mately. Then the probability of a numerical deviation as large as (or larger 
than) 300 seconds is Q = 1 — P = 0.04. This would be represented graphi¬ 
cally by the area under the curve outside t = ±2.02. 


11. Probability Paper. The cumulative frequencies for the nonnal 


0 (<) curve are given 


by A = /' . 


As t varies from —«« to + «, 


A varies from 0 to 1, and for the finite range t = ±3 (commonly met 
in practice) A varies from 0.00135 to 0.99865. (Verify.) Regarding 
A as a function of t, values of (t, A) from the tables may be plotted 
and the resulting points joined by a smooth curve. 


A 



When graphed on an algebraic scale this curve is the ogive of the 
normal curve. It is also called the integral curve of As indi¬ 
cated in Figure 26, the ordinate of the ogive is zero at < = — «>, 
.5 at < = 0, and the ogive approaches the line A = 1 asymptotically. 

Now imagine the vertical scale of Figure 26 stretched in such a 
way that the ogive becomes a straight line. The stretching required 
will be greatest around the line A = 0.5 and gradually diminish as 
the distance from this line increases. 

Paper so ruled that the (t, A) graph is a straight line is called 
probability paper. It is reaxhly obtainable* and is convenient for 
many purposes. Thus, by plotting cum f for an observed distribu¬ 
tion on probability paper, one may observe how closely it approxi- 

1 The Codex Book Company, New York. 
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mates a straight line and hence get an idea of how nearly normal it 
is. One may thus locate graphically the median, quartiles, etc., and 
estimate frequencies between given limits. 

A more complete discussion giving references to writers who sug¬ 
gested and developed the use of probability paper may be found in 
the Journal of the American Statistical Association^ vol. XXVI, June 
1931, p. 178. 

Exercises 

1. Construct three normal curves on the same axes according to the following 
specifications. Compute ordinates at intervals of .5<i- from the mean in 
tiie range x dt 3(r. 


Curve 

ax 

X 

N 

A 

10 

60 

400 

B 

10 

50 

800 

C 

10 

50 

1200 


Suggested form for computations: 



2. Construct three normal curves on the same axes according to the following 
specifications. Compute ordinates at intervals of .5<r from the mean. 


Curve 


5 

N 

A 

15 

60 

1000 

B 

10 

50 

1000 

C 

5 

60 

1000 


Suggestion: 
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Observe that: 

yc = 200<^(0 

ya = \yc - 1000(0 

, 200 ,,, 
yA = \yc = — 0 ( 0 * 

3 . Verify the entries in Tables 29 and 30. 

4 . For the following distribution: 

(a) Find the equation of the best fitting nonnal curve, and plot the curve 
and histogram. 

(b) Find the graduated frequencies. 


mid-x 

2 

4 

6 

8 

10 

f 

1 

4 

6 

4 

1 


5. Graduate the distribution in Table 8, §11, Chapter I. Also find the ordi¬ 

nates of the best fitting normal curve and plot the curve and histogram. 

6. A distribution of the weekly wages of 906 anthracite miners showed the 

following results: 

S - $36.13 as - 0.007 

<r» ~ $8.87 ot4 = 3.02 

Assuming a nonnal distribution, estimate the number of the 906 miners 
who received weekly wages (a) in excess of $45, (6) less than $26. 

7 . An urban electric railway company operating a large city subway uses 

thousands of electric light bulbs in its underground stations. On January 
1, 1947, the company put into service 5000 new light bulbs. Let it be 
assumed that these 5000 bulbs will have a mean life of 50 days, a stand¬ 
ard deviation of 19 days, and that their lives conform to the normal 
curve. 

If January 1 is counted as a full day in the life of the bulbs: (a) How 
many bulbs out of the 5000 new ones would have had to be replaced by 
midnight January 31, 1947? (b) How many by March 10, 1947? 

8. Which properties of the normal curve may be used as criteria in passing 

judgment on the normality of an observed distribution? Would you say 
that the distributions referred to in Table 23 are approximately normal? 

9. Graph the ogive of the normd curve by plotting values of (^, A) in the range 

i = ±3, (a) on an algebraic scale, (b) on probability paper. 

10 . What famous mathematicians’ names are associated with the normal curve? 
When did these men live? Which of them should most appropriately be 
credited with the discovery of this curve? 

11 * (Camp) The standard deviation of a certain set of 100,000 high school 
grades was 11%, and the mean grade was 78%. Assume the distribution 
to have been normal, and, being careful not to confuse percentage in the 
sense of grade with a percentage of frequency, answer the following ques¬ 
tions: How many grades were (a) above 90%, (b) below 70%? (c) What 
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was the highest grade of the lowest 1000? (d) Within what limits did the 

middle 90,000 lie? (e) What was the semi-interquartile range? 

12 . (Camp) Answer all the questions of Exercise 11 with reference to a set of 

100,000 grades in which the median was 83% and Qa was 90%. Also 
find ffz. 

13. In a certain normal distribution, N = 1000, x = 50, ax = 10. For this 

distribution: 

(a) Convert the following x’s into the corresponding ^'s, 


X 

15 

20 j 

25 

30 

35 

40 

45 

50 

55 

60 

65 j 

70 

75 

80 

85 

t 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 


(6) Find from the tables the values of 4>(t) for the i values in (a). 

(c) Convert the 4>(t) values obtained in (a) into y values. 

(d) Plot the (x, y) values in (a) and (c) and draw a smooth curve through 
them. 

(e) Find the cumulative relative frequencies, A = j , for the values of t 

«/ _ 00 

in (a). 

(/) Difference your results in (e) by finding AA. 

(g) Convert the percentages in (/) into frequencies. 

(h) Explain the meaning of your results in (g) with reference to the figure 
for (d). 

(i) Find the number of variates between a; = 42 and x = 74. 

(j) Find the values of x for which cum f == 250, 600, 750, respectively. 

14. Given a normal distribution in which N = 800, x = 40, = 7. Find the 

numerical value of each of the following. 

X t=8 
=0 

16. Suppose N = 5000 variates are normally distributed such that 5 = 50 and 
E = 13.49. Without using the tables find the value of the following: 
quartiles, median, mode, standard deviation, mean deviation, x for which 
cumf = 1250. 

16. Suppose there are N values of a variable v which are normally distributed 
with mean = 0 and variance = 25. 

(a) Give the equation of the curve which represents the distribution. 

(b) If there are 793 values between i; = *-5 and v = 0, determine N. 

(c) What percent of N have values larger than v = 10? 

(d) Determine the value of t; for which cum f = ,75N, 





















CHAPTER VII 
CURVE PITTING 


1. Empirical Expressions. The preceding chapters have dealt 
with the description and characterization of frequency distributions. 
We have considered three general methods of description: (1) graphi¬ 
cal devices, (2) the method involving calculation of averages and 
measures of dispersion, (3) the method which is sometimes called 
analytical. This latter method consists in describing the distribution 
by an equation, and we considered only one such analytical expression, 

the normal curve. 

ExampU 1. Expectation of Life' at various However, another branch of 

statistics is concerned with 
data which may not be classed 
under frequency distributions, 
but which may be described 
by simple equations. 

When one variable is a func¬ 
tion of another in applied 
mathematics the mathematical 
relation between them is not 
always known. As we men¬ 
tioned in Chapter II, the only 
information regarding this 
functional relationship may be 
a set of pairs of values obtained 
by experimental or observa¬ 
tional means. These pairs of 
values may be regarded as 
coordinates of points and plot¬ 
ted. In doing so, the values 
of the variable which is regarded as independent are t aken as 
abscissas, and those of the dependent variable as ordinates. 

The general problem in such cases is to find, if possible, an analytic 

^ By expectation of life at any age is meant the average number of years lived 
by persons attaining that age, as given in the American Experience MortalUy 
TabU. 

m 











40 









o o ! 


























20 30 40 50 60 70 60 90 


Aqb 

Expectation 

20 

42.20 

30 

35.33 

40 

26.16 

50 

20.91 

6o 

/4./0 

70 

3.43 

60 

4.39 

90 

7.42 
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expression of the form y = f(x) for the functional relationship sug¬ 
gested by the data. Equations obtained to fit observed data as well 

^ possible are call^ empir- ExampU2. Yearly Production of Cigarettes 
ical to distinguish them United States 

from the rational expressions 

of pure mathematics which ,oq | [ ]_ | [ | | 

can be derived from reason¬ 
ing. This general problem 
is called curve fitting. It is 
also sometimes referred to 
as smoothing ’’ the given 
data. 

We will consider three 
types of functions: linear, 
quadratic, and exponential, 

2. Linear Functions. We 
know from algebra that the 
general form of a linear equa¬ 
tion in two variables is 

Ax + By = C 

where B, and C are arbitrary constants. 

When B 9 ^ 0, the equation may be solved for y, giving y = 
— {AIB)x + C/B which is of the form 

✓ 

(1) y = mx + A* t.'" 

and which is the form we will ordinarily use to represent a straight line. 
The special cases where A or JS or C are zero is as follows; 

When = 0, then y = C/B, which is of the form y = k. This is 
a line parallel to the x-axis. When JS = 0, the equation takes the 
form X = k which is a line parallel to the y-axis. When C = 0, then 
Ax + By — 0 which is a line passing through the origin. 

/' Thegraphof(l) is a straight line (which explains the term‘‘linear’’)• 

' A characteristic property of a linear function is revealed at once by 
its graph. This is the fact that the ratio of a change in y to the 
corresponding change in x is constant. Thus, if two points (a;i, yi) 
and (x 2 , y 2 ) are chosen on the line, the value of the ratio 

»2 - yi 


90 

60 

70 

60 


1923 *24 *25 *26 '27 '26 


Year 

Billions 

1923 

66.7 

#924 

72,7 

1925 

62,6 

1926 

92.1 

1927 

93.0 

1926 

100,0 


v/ m 


X2 - Xi 
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is independent of the points chosen. This ratio gives the average 
rate of change of any function over the interval Ax = X 2 — Xi. In 
the case of a linear function, m defines the rate of change of the func¬ 
tion. 

\ Graphically, m is the slope of the line. It is the tangent of the 
\angle of inclination a (alpha) which the line makes with the positive 

x-a;ris.^ Lines having the same slope 
are parallel, and conversely. 

It is shown in analytic geometry 
that we may obtain the slope of a 
straight line from its equation if we 
solve for y and take the coeflScient 
of X. Thus in 2x — 2 / = 6, 2 / = 2x 
— 5 and the slope i^.. 

Conversely, if we know the slope 
of a line and the coordinates of any 
point on the line we can write its equation from the relation 

(2) L 2/ - 2/1 = - ^i) 

which is called the point-slope form of a straight line. Thus, given 
that (2, — 1) is a point on a line whose slope is 2, the equation of the 
line is therefore 2/ + 1 = 2(x — 2) or 2x — !/ = 5. 

Or again, remembering that m is defined by a ratio involving the 
coordinates of two points on a line, we can obtain the equation of a 
line if we know any two points which lie on it. From the definition 
of m and (2), we have 

(3) 2/ - 2/1 --- ^ i ) 

X2 ”” Xj 

which is known as the two-point form of a straight line. Thus, given 
that (2, — 1) and (6, 7) are two points on a line, its equation is 

y+l = (« - 2) or 2x-y = 5. 

3. Quadratic Function. A quadratic function of a variable v 
is a polynomial of the second degree in v which may be expressed in 
the form Av^ + 2Bv + C where A, B, and C are fixed real numbers. 

1 When the line is vertical, a « 90® and m does not exist. Then Ax » 0 and 
division by zero is excluded in our algebra. 
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The minimum value of such a function is useful in statistics. We 
have 

Av^ + 2Bv + C^^ [Ah)^ + 2ABv + AC\ 

A 

= 4 + (^c- - 5»)]. 

A 

Since {Av + is positive or zero and (AC — B^) does not involve 
the variable, we have the following: 

Theorem I. If A is positive the minimum value of Av^ + 2Bv + C 
occurs when Av B — the minimum 
value is (AC — B^)IA. 

The graph of the equation y = Av^ + 

2Bv + C, (A > 0), is a parabola which 
opens upward and whose vertex is where 
^ = -^BfA, Of course the function has its 
minimum value at this vertex, viz,: (vo, 2 / 0 ) 
where vq = --BIA^ j/o = (AC — B^)fA. 

Exercises 

1 . (TTiZsow and Tracy) The premium ($t/) on a $1000 life insurance policy for 
various ages (x years) is given in the following table. Draw a graph ex¬ 
hibiting 2/ as a function of x. Estimate from the graph the premium at 
age 32 and at age 43; also the age at which the premium is $52. 



X 

20 

25 

30 

35 

40 

45 

50 

55 

60 

V 

18.78 

21.02 

23.86 

27.54 

32.36 

38.83 

47.68 

59.88 

76.94 


2. Find an equation of each of the lines through two points given as follows: 

(a) (2,6), (4,5); (6) (0,3), (1,6). 

3. Find the equation of a line through the point (2, 3) and parallel to the line 

4a; + 51/ = 7. 

4 . (a) Find the value of x for which f{x) = 2x* — 8a; + 9 has a minimum 

value. (6) What is this minimum value? (c) Draw a graph of y =/(a;) 
and show the meaning of your answers to (a) and (b), 

6. How would the theorem in §3 be affected if >l < 0? 

6. Prove that the second moment of a; is a minimum when taken about the 
mean of x. 
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Hints. SoliUion 1. 

Let /W * 

N 1 

1 ^ 

N 1 

By the theorem of §3, show that/(t;) is a minimum when v = 2. 
Solution 2. By definition, 

1 ^ 

M2 = *“ *)* 

N 1 
1 ^ 

J'2 = 77 23(x* — 3^ 2^- 

N 1 


Is M2 < va? 

Solution 3, for calculus students. From f{v) as derived above, 

2 ^ 

/(o) = - ^E(*< - «')• 

N 1 

Set /'(«) =0 and solve for v. Since /"(») >0, » = 2 yields a minimum, 
not a maximum. 

N N 

7. Show that the value of k for which/(fc) = Nh^ + 2k(tn^Xi — +(^13 

1 1 

a minimum is defined by 

AT iV 

ni^Xi + Nk = '^Vi. 

1 1 


4. Fitting a Straight Line. The preceding discussion is intended 
as a basis for the presentation of certain methods of fitting a line to 
data. The equation y = mx k represents a family or set of 
lines corresponding to different values of the arbitrary constants 
m and k. As noted previously, such constants are called parameters. 
The process of finding the best fitting line for any given data consists 
in determining m and k. By “ best fitting ” we mean best under a 
criterion of approximation specified by a method. We will consider 
three such methods: (o) graphical, (6) the method of moments of 
ordinates, (c) the method of least squares. 

6 . Graphically. A straight line is drawn (preferably with the aid 
of a transparent ruler) to fit as closely as possible the plotted points. 
To find the equation of this line, select two points on the line and esti¬ 
mate their coordinates (xi, yi) and ( 0 : 2 , y^). Substituting these coor¬ 
dinates in the “ two-pcant ” form of the line (3), we get the dedred 
equation. 
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If the first point is chosen so 
that a;i = 0 the numerical work 
of simplif 3 ning the equation is 
somewhat lessened. 


Example 3. Fit a line graphically 
to the data in Example 2. 

We take the oiigin of x at 1923, 
hence from the figure (xi = 0, yi = 
67) and ix 2 = 5, 2/2 = 100). 

By equation (3), 


2 / -67 = 


100 - 67 

-::- X. 


Therefore, 

y = 6.6x + 67 


is the required equation. 



X 

y 

( 1923 ) 0 

66.7 

1 

72.7 

2 

62.3 

3 

92.1 

4 

93.0 

(t92B) 5 

100.6 


The graphical method is open to the objection that it depends 
upon the judgment of the investigator. Different people will lo¬ 
cate the line in different positions and therefore obtain different equa¬ 
tions. However, where only approximate results are needed it is 
usually quite satisfactory. 

6 . Method of Moments. In equation (1) 2 / is not only a function 
of X but it is also a function of the parameters m and k. This func¬ 
tional relationship may be expressed symbolically by the notation 
f{x, m, k). Given the functional form of a curve y =/(a:, a, 6, 
c, • • •) the parameters a, b, c, • • • may be determined by obtaining 
expressions for as many moments of the computed or functional i/’s as 
there are parameters in the function and equating these to the numeri¬ 
cal moments of corresponding order of the observed or empirical y’s. 
A solution of the resulting equations, theoretically possible, gives 
the best ’’ values of the parameter. This is the method of moments 
of ordinates. For a set of N values of (x<, 2/i) the rth moment of y is 
defined by the expression 

1 ^ 


where r is zero or a positive integer. 

In fitting a straight line by this method we obtain two equations 
involving m and k if we equate the zerof/i and first moments of the 
observed y’s to the zeroth and first moments, respectively, of the y’s 
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computed from the assumed equation y = fruc + k. All moments 
are taken about the origin of x. These two equations may then be 
solved for m and k. The procedure will be made clear by the figure 
and explanation below. 



X 

oY 

X 

cV 


Vi 


mxi + k 


y» 


mxj + k 

• • 

• • 

• • 

• • 


Vl 

*1 

mxj + k 

• • 

• • 

• « 

• 9 


Vn 

Xn 

mxn + k 


Suppose we are given N pairs of values of x and y. Denote the 
given or observed y’a by oy and the computed y’s by cV- For the 

observed y’s, the first moment is — '^xiyt, and the zerof^ moment is 
®y a ** computed y ” corresponding to any value of x we 


mean the result obtained by substituting that value of x in the equa¬ 
tion y “ mx -|- k, and solving for y. Thus, for any value of x, say 
Xi, we obtain tnxi -t- k for the corresponding computed Graphi¬ 
cally, it is an ordinate of the line. Therefore, the first moment of 

the computed y’s is — J^iirnxi -|- k), and the zeroth moment is 


Sec. 6 


Method of Moments 


143 


^ + k). Applying the principle of moments we have 

observed computed 
zero^A moment = ^{rnxi + k) 

first moment ^Xiyi = ^Xi(niXi + k) 

where the summations run from 1 to N. 

To solve for m and k we write the preceding equations in the follow- 


ing form: 

(4) { 

By determinants, 

+ kN = Y,yi 

+ k^Xi - J^Xij/i. 


r 

m = 

h — 

Hxy 

N 

E* 

(E») (Ex) - NY^xy 

(6) ■ 

E* 

E** 

E^ 

E^* 

N 

Ex 

Zy 

Zxy 

(Ex)* - ivEx* 

(Ex)(Exy) - E»E' 


H, — 

0 


D 

D 


The determinant D in the expression for k is the same as that in the 
denominator of the expression for m, [In order to solve equations 
(4) for the values (5) it is assumed that D does not vanish.] The 
terms in the expressions for m and k refer to the original data. When 
these expressions have been evaluated they replace m and k in the 
equation 2 / — mx + k. 


Example 4. Find by the method of moments the best fitting line for the data 
in Example 2. 


X 

y 

xy 


0 

66,7 

0 

0 

1 

72.7 

72.7 

1 

2 

82.3 

164.6 

4 

3 

92.1 

276.3 

9 

4 

93.0 

372.0 

16 

5 

100.6 

503.0 

25 

15 

507.4 

1388.6 

55 

(507.4) (15) - 

6(1388.6) 

, 15(1388.6) - 

55(507.4) 

(225) - 

6(55) - ® 

^ ~ D 
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Therefore, 


y * 6.86a; + 67.4. 


7. An Alternative Procedure. In practice, it is sometimes easier to 
remember the procedure of fitting a line by the method of moments if 
one obtains the equations in (4) directly from the data instead of using 
the formulas for m and k. This will involve the following three steps: 

(а) Substitute each of the given pairs of values \ny — mx + k and 
add the corresponding members of the resulting equations.^' This 
gives the first equation in (4). 

(б) Multiply each equation in (a) by the coefEcient of m in 
that “ equation and add the corresponding members of the re¬ 
sulting “ equations.^* This gives the second equation in (4). 

(c) Solve the equations simultaneously. This will give the 
required values of m and k. 

The algebraic statements which we designated equations (de¬ 
noting that the statements are only approximately true) are called 
observation equations in the theory of errors. A linear combination 
of a set of linear observation equations is a true equation. 

Example, Verify, for the data in Example 2, that the above procedure gives 
the same values of m and k as the formulas. 


Step (a) 

66.7 = Om + 

72.7 = Im + A; 
82.3 = 2m + fc 

92.1 = 3m 4- fc 

93.1 = 4m + k 

100.6 = 5m -f A; 

507.4 = 15m + 6A; 


Step (6) 

72.7 = m + 

164.6 = 4m + 2A; 
276.3 = 9m + 3fc 
372.0 = 16m + 4fc 
503.0 = 25m + 5k 

1388.6 = 55m + 15A; 


Step (c) 


Solving the equations, we obtain m = 6.86, k = 67.4, as before. 


8. Least Squares. Case I, A standard method of fitting a curve 
to empirical data is one known as the method of least squares. As¬ 
sume, as before, that the plotted 
data suggest the linear relationship 
y = mx + k. Let d represent the 
difference between the ordinate of 
any given point and the correspond¬ 
ing ordinate of the line, that is, 
di = [yi — {mxi + k)]. These dif¬ 
ferences are called residuals. The 
method of least squares is based upon the following principle. 
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Principle op Least Squares. The best estimate of a param¬ 
eter is that for which the sum of weighted squares of the residuals is a 
minimum} 

The sum is to be taken over all the observations that are subject 
to error. We shall assume that the observations are all of equal 
weight; consequently we may let each of the vveights be unity. Then 
the parameters m and k are estimated by imposing the condition that 

N 

^di^ be a minimum. Now 
2^2 = J^[y - (mx + A*)]2 

(6) = NA-2 4“ 2mk^x + m^y]x^ — 2k^y — 2m^xy + 

This is a quadratic polynomial in A*. We may write it in the form 

(6a) m = Nk^ + 2k(mJ^x - + C 

where C represents the terms not involving Ar. Then according to 
Theorem I the minimum value of f(k) occurs when 

, - mY,x 

-s—• 

that is, when 

Nk + m^x — ^y = 0. 

The right member of (6) is also a quadratic polynomial in m. We 
must choose m so that 

m^x^ + ““ 2 ^ 2 / = 0 . 

These last two equations ^ are the same as (4). When obtained by 
the method of least squares they are called normal equations. There¬ 
fore the values of m and k in (5) determine the best fitting line by 
both the method of moments and of least squares. It can be shown 
that the two methods give the same result for any polynomial.^ 

It is interesting to observe that the sum of the residuals is zero. 
Thus it can easily be shown that ^[y — {mx + k)] = 0, when the 

^ For further information about this principle and a discussion of weights, the 
following books are recommended: (a) Reference 4. (ft) Statistical Mathematics 
— A. C. Aitken. Oliver and Boyd. 

2 The student of calculus would obtain these equations as follows. Let 
/(m, k) = — mx — fc)*. Then differentiate /(m, k) partially with respect 

to m and fc, respectively, and equate the results to zero. 

* See American Mathematical Monthly^ September, 1923. 
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valura given in (6) are substituted for m and k. This property and 
the fact that the sum of the squares of the residuals is a minimiim are 
quite analogous to two similar properties of the arithmetic mean, viz., 

(1) The sum of deviations from the mean is zero. 

(2) The sum of the squares of deviations from the mean is less than 
the sum of the squares of such deviations taken from any other value, 
i.e., fit < vt. 

Case II. In Case I distances between the points and the line were 
taken parallel to the y-axis. But we may just as logically, from a 

formal point of view, take dis¬ 
tances parallel to the a:-axis, and 
make the x residuals the basis for 
a least squares criterion of best 
fit. Similarly, for the method of 
moments: we can set up two 
equations such that the first mo¬ 
ment of the observed x’s equals 
the first moment of the computed 
x's, and the zerotft moment of the observed x’s equals the zerotli mo¬ 
ment of the computed. To do this let x = w,'^ + 6 represent the 
equation of the line. Then by the principle of moments we have 

+ b) 

= '^yirmy + b). 



Solving for m* and b we obtain 


mj 


'E,x£,y - NY.xy 
D 


(7) 


^ ^ 12yT,xy - 'T.x'Tjy^ 
D 

, - N'^y^. 


If we determined mj and b by making* the sum of the squares of 
the X residuals a minimum we would get the results given in (7). 
The expressions in (7) are those of (5) with a: and y interchanged. 

In general. Cases I and II will give different lines. Case I assumes 
that the observed points fail to fall on the line because of errors 
in the ordinates only. Case II assumes that only the x-co6rdinates 
are in error. In the application of curve fitting to economic data, 
etc., the formal mathematical procedure should not be used without 
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first verifying that the underlying assumptions involved in the pro¬ 
cedure are justified. Inasmuch as the independent variable x can 
be controlled in experimental and 
observational data, the errors 
usually exist only in the y^s. 

Therefore, in speaking of the best 
line by the method of moments 
or least squares it is conventional 
to mean the line which fits best in 
the sense of (5) rather than (7). 

Case III {for calculus students). 

A third line can be obtained 
which fits best in the sense that 
the sum of the squares of the per¬ 
pendicular distances from the points to the line is a minimum. 

Let us suppose the equation of this line to be in the form 

y' = mx^ + k 

where x' = x x, y' = y — y^ and (x, y) is the mean of the ob¬ 
served data. The distance di from this line to a point (x/, y/) rep¬ 
resenting a pair of observed values (referred to their respective 
means as origin) is, from analytics, 

y/ — mxi - k 
. di — /- • 

Vrn^ + 1 
We 

m and k so that the function 

is a minimum. This function may be written in the form 

/(m, k) = 7 W + - 2mrcry(Tj) 

where r is a convenient symbol defined by the relation 

1 ^ 

TiXy(Fx ~ ^ 2/< • 

Jy I 


wish to make ~ a minimum. Therefore we are to choose 

NT 
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To make /(m, k) a minimum we first put = 0. Then we equate 
to zero the first derivative with respect to m and obtain 

mhayffx — — ra^x = 0 . 

Solving for m we have 

W - ‘f**) ± [W - 

m - - - - 

zro-yff* 

Therefore the required equation is y' = mx'. Referred to the origin 
of X and y, this is 

y - y = m(x - x) 
where m is determined above. 

This line is the appropriate one to fit if there are errors in both x 
and y of the empirical data. 

A special problem under Case I. Sometimes problems arise where 
the line to be fitted is restricted in some way. For example, the 
nature of the problem may require that the line shall pass through 
the origin. If this condition is imposed, (1) takes the form 

y = mx. 

The least squares estimate of the slope of this line depends upon 
various assumptions about the errors. If y is subject to error and 
X is free of error, and if the observations are all of equal weight, it is 
easy to show that 

m = 

by the principle of least squares. This principle will give different 
estimates of m under different assumptions about the weights of 
the observations. Several particular solutions of the more general 
problem and some applications will be found in §15 of reference 4 
on page 6. (See also our Exercise 11, p. 189.) 

Exercises 

1. Fit a line to the following data by Case I: 

Atis. y ■= — .5* + 8. 


X 

67788899 10 

V 

56464343^ 


Hxy 

2^x^ 
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2. Show that = 0 for Exercise 1. 

3. Using the values given in (5) for m and k show that 5Z[y - (mx + A;)] = 0. 

4. Verify the expressions for m 2 and h given in (7). How would you modify 

the alternate procedure ” so it will apply to m 2 and 6? 

5. Fit a line to the data of Example 2 by the method of Case II. 

6 . Show that the formulas in (5) fail when the x's are all equal. Hint Re¬ 

place X by a constant c in the denominator D. 

9. Simplification. The formulas for m and k may be simplified. 
For certain purposes it may be desirable to make the transformations 
x' — X — X and = y — y. This has the effect, graphically, of 



translating the origin to the point (5, y) so that the y-axis is moved 
to the value x, and the a:-axis is moved to the value y. Let the equa¬ 
tion of the line with reference to these new axes be y' = rrtix! + ki. 
The formulas for mi and k\ will be the same as for m and k except 
that X will be replaced by x' and y by y\ Hence 

‘ N^x'^ - (£xT 

But since x' is a deviation from the mean of x, J^x' = 0. Similarly, 
^y' = 0. Hence the values of OTi and ki reduce to 

T^xV 

(8) OTi = -> ki = 0. 

Therefore the line goes through the new origin, and its equation is 

(9) y' - wiix' 
where mi is defined in (8). 



150 


Curve Fitting 


vn 


The above transformation may not lighten the computations un¬ 
less the values of a; or y are equispaced. However, it does simplify 
the theory in certain appUcations, particularly in correlation theory 
(Chapter VIII). 

10. Time Series. If one of the variables is time, as in Examples 1 
and 2, the data are called a time series. The best fitting line is then 
commonly called a trend line or trend. In the process of fitting a 
trend line, a first simplification, obviously, is to take the origin at one 
of the given dates as we did in Example 3. But a much greater 
simplification is possible, if the aj’s are equispaced, as they usually 
are in a time series. Denote the common differences of the x’s by c 
and the mid-date by x. Then we may shift the origin to x and change 
the unit of measurement along the horizontal axis to c. Thus we may 
let 


( 10 ) 

whMe 

( 11 ) 



X 


Xi +Xy 
2 


if the x’a are equispaced. 

Let us think now of our line in (t, y) coordinates, and let its equa¬ 
tion he y = at b. Our problem is to find a and b numerically from 
the given data, as we found m and k before. Our normal equations 
will be 

+ b) 

J^ty = 23 


1 T-> V-* 

Since = - ^{x — x) =0, and ^b = Nb, the above equations 
c 

are readily solved, giving 


( 12 ) 




The student should remember that this simplification can be used 
only when the x’s are equispaced. 

Example 5. Find the trend line for the following data. Here e = 5, and from 
( 11 ) 2 - 10 . 
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X 

y 


ty 


0 

12 

-2 

-24 

4 

5 

15 

-1 

-15 

1 

10 

17 

0 

0 

0 

15 

22 

1 

22 

1 

20 

24 

2 

48 

4 

Sums 

90 


31 

10 







From (12), 


.-S-S,., 



= 18. 


So the required equation is y = 3.1/ + 18, with reference to the new origin and 
units. If we wish it in terms of we substitute 

a; - 10 
^ “ 5 

and obtain y = .62a; + 11.8. 

Example 6. Same as Example 5, with another observation added. Note that 
when there is an even number of observations, the values of t are fractional. 
In this case it is convenient to use the column headings 2ty instead of ty^ and 4/® 
instead of /*. 


X 

y 

t 

2ty 

4/2 

1 

0 

12 

\ 

-5/2 

-60 

25 

5 

15 

-3/2 

-45 

9 

10 

17 

-1/2 

-17 

1 

15 

22 

1/2 

22 

1 

20 

24 

3/2 

72 

9 

25 

30 

5/2 

150 

25 

Sums 

120 


122 

70 


i = 12.5, = 61, = 17.5 

a = 3.49, 5 = 20 

y = 3.49t + 20 

,.3.49(i^®) + 20 

2 / = .7x + 11.28. 
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It. Exponential Trends. When the given y values form a geo¬ 
metric progression while the corresponding x values form an arith¬ 
metic progression, the relationship between the variables is given 
by an exponential function, and the best fitting curve is said to 
describe an exponential trend. Data from the fields of biology, 
banking, and economics frequently exhibit such a trend. Thus the 
growth of bacteria is exponential. Money accumulating at com¬ 
pound interest follows the same kind of law of growth. And in busi¬ 
ness, sales or earnings may grow exponentially over a short period. 
Another familiar example is the increase in friction as a rope is 
coiled around a post. As the number of coils increases in arith¬ 
metic progression, the friction increases in geometric progression.^ 
This explains why a few turns of the hawsers around the bitts at the 
wharf is sufficient to hold a large ship. 

The characteristic property of this law is that the rate of growth, i 
that is, the rate of change of y with respect to a;, at any value of a; is n 
proportional to the value of the function for that value of x. The^^ 
function 

(13) y = ile®* 

has this property.^ The letter c is a fixed constant, whereas A and 
B are parameters to be determined from the data. If y decreases 
as X increases, B is negative. An interesting example of this case is 
the disappearance of radioactive substances like radium. 



Fig. 28 — Gbnerai;"Appearance op the Graph op (13) por a: ^ 0 and A > 0. 

To assume that the apparent law of growth will continue is usually 
unwarranted, so only short range predictions can be made with any 
considerable degree of reliability. When the exponential character 

1 Elementary Mathematical Analysis — C. S. Slichter. McGraw-Hill. 

* The student of calculus will understand that rate of change ’’ is used here in 
the derivative sense. For (13), dy/dx =* ky. 
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of the observed phenomenon ceases a saturation point is said to be 
reached. 

The parameters A and B. If we transform (13) so that it is linear 
with respect to its parameters we may use the methods for fitting 
a straight line to determine A and B. To this end we first take the 
logarithms (to base 10) of both sides of (13), obtaining 

(14) log «/ = log A + (B log e)x 

which is of the form 

(16) F = ft + ntx 

where F = log t/, ft = log^l, m = Blog e. 

If we look up the logarithms of the given y’s and denote them by F, 
we may fit the equation Y = mr + ft to the (x, Y) values by deter¬ 
mining m and ft by means of the formulas given in (5). In using 
these formulas we must remember to replace y by F. After m and ft 
are determined, A and B may be obtained from the relations 

A = anti-log of ft 

B = m/log e, where log e — log 2.718 

- .4343. 


The student may be interested to verify that the relation Y = mx + k 
can be put back into the form (13). We may write (14) in the form 


y _ 10^°* -A + (fi log e)» 

= Ae®*. 


The last step follows because 10'°®'®^ = Nhy definition of logarithm.j 

Example 7. Find the exponential trend for the following data, and draw the 
curve. 


X 

y 

Y 

xY 

I* 

1 

1.6 

.2041 

.2041 

1 

2 

4.5 

.6532 

1.3064 

4 

3 

13.8 

1.1399 

3.4197 

9 

4 

40.2 

1.6042 

6.4168 

16 

5 

125.0 

2.0969 

10.4845 

25 

15 


5.6983 

21.8315 

55 
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Fronf (5) we have, 

D = (!)*)» - N'Zx* 
m = - NZ^Y] 

k = ^lZxZxY-ZYZxn. 

Therefore, 

D = [(15)> - 5(55)] = -50 
m = ^[(5.6983)(15) - 5(21.8315)] = .4737 

k = i [15(21.8315) - (5.6983)(55)] ‘ 

= -.2813 = 9.7187 - 10. 


And 


log A = 9.7187 - 10, hence A = .5232 


B = 


m 

.4343 


1.091. 


Therefore the required equation is 

y = .5232e>-«”*. 


When the x*s are equispaced, as here, the work may be simplified by using (10) 
and fitting a line 


7 = + 5 . 


The problem now is essentially the same^ as in §10 where a and 6 are defined in 
(12) except that we are now dealing with (^, 7) coordinates instead of (t, y). 

The method is illustrated below. 


t 


tY 


-2 

.2041 

-.4082 

4 

-1 

.6532 

-.6532 

1 

0 

1.1399 

0.0000 

0 

1 

1.6042 

1.6042 

1 

2 

2.0969 

4.1938 

4 

« =*»-3 

5.6983 

4.7366 

10 


1 The critical reader will realize that fitting a straight line to the values of log y 
is not quite the same as fitting an exponential to the values of y. However, the 
discrepancy usually does not affect the fit seriously. For a method which is free 
from this ^fficulty, see Glover^a Tables^ p. 468. 













Sec. 11 

Exponential Trends 

From (12) 

'EtY 4.7366 _ 

.0 

!•-1 Sr.1.1397. 

So 


Y = .4737/ + 1.1397. 



Transforming this into (x, Y) coordinates we have 

Y = A737(x - 3) -f-1.1397 
= A737x - .2814 

as before. 

For purposes of plotting, predicting, or interpolating, values of y in (13) may 
be obtained by means of the intermediate form (15). So, to sketch the curve 



1 2 3 4.5 

Fig 29 


for this example, we first assign values to x in the last equation, compute the 
corresponding values of F, and then obtain the values of y from a table of loga¬ 
rithms. These values are given in the following table. The curve in Figure 29 
is sketched from the (x, y) values in this table. 


X 

Y 

y 


1 

0.1923 

1.56 


2 

0.6660 

4.63 


3 

1.1397 

13.79 


4 

1.6134 

41.06 


5 

2.0871 

122.2 


6 

2.5608 

363.8 


1c 
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12l| Further Remarks on the Exponential Function. Equation (13) 
gi^etimes called the compound interest law because it describes 
the way money would grow if interest were compounded continu¬ 
ously. If P dollars are invested at a nominal rate j% compounded 
m times a year, the amount S after x years is given by the formula 


S 


-'’ 0 + 0 ' 


If j is compounded continuously or, in other words, if m is taken 
indefinitely large (written m —» «), the amount S does not increase 
indefinitely but approaches a limiting value. We may write the 
expression for S in the form 


S = P 

If we let N = m/j, we have 


It can be shown in the calculus ‘ that, as iV— » the quantity 

/ lY 

( 1 -f- — I approaches the limit called e. Thus we have 

lim + = e = 2.718-•• ' 

\ N/ 


This limit is also the base of the Napierian, or natural, system of 
logarithms. As m —> oo so does N —> ». Therefore in the ideal case 
of continuous conversion of interest, we have the limiting form 


that is 


S = 


lim P 

m — 

lUn P 

iV->00 



S = 


which is of the form (13). 

There are several other forms of the exponential function. For 
example, if we let r = e®, (13) becomes 


y = Ar* 

* Ihe teacher can give appropriate references. 
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which is the general term of a geometric progression whose first term 
is A and common ratio is r. 

If B is negative in r = then r < L So (13) is a decreasing func¬ 
tion when B is negative. 

If we let 10*= = e®, (13) becomes 

y = ^10*®. 

Then k = B logio e and k differs from B by the factor logio e. This 
factor is known as the modulus of the system of logarithms of base 10 
with respect to the system of base e. 

The value of the reciprocal of the modulus 

= 2.3025851 • • • 

logio e 

is often useful. For example, suppose that the logarithm to base e 
is required for a given number N and tables to base 10 only arc 
available. Let log^ N — x. Then e* = N, and x logio e = logio N, 
whence x = logio iV/logio e = 2.303 logio N. (Hereafter, the base 10 
will be understood unless otherwise indicated.) 

13. Ratio Charts. In the graphical representation of data that 
exhibit an exponential trend, it is often desirable to use semi-logarith¬ 
mic paper. Such paper has a logarithmic scale in the vertical direc¬ 
tion and a uniform scale in the horizontal direction. (Figure 30.) A 
logarithmic scale is one in which the distance from y = 1 to y = N 
equals log N, A cycle ” of rulings spaced according to the loga¬ 
rithms of the integers from 1 to 10 is the unit of the vertical log y 
scale. 

Semi-log ” paper may be constructed or purchased having one 
or more cycles. The appropriate number of cycles is determined 
by the range of y values in the data to be plotted. If the bottom line 
of the first cycle is labeled 1 and taken as the origin of log y 
(log 1 = 0), the beginning of the next cycle is read 10 (log 10 = 1), the 
next one above that is read 100 (log 100 = 2), etc. However, the 
beginning of the first cycle may be labeled with any number which 
is an integral power (positive or negative) of 10, as .01, .1,10, 100, etc. 
Corresponding lines in successive cycles are labeled with numbers 
which are 10 times those in the preceding cycle. Since y has no real 
logarithm if y ^ 0, neither zero nor negative numbers are found on 
a logarithmic scale. Plotting a point whose semi-logarithmic co- 
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ordinates are (x, y) is equivalent to plotting the point whose rectangu¬ 
lar coordinates are {x, log y). 

Example 8. Plot y = 8 (2*) on semi-log paper. 

SoliUion. Assigning values to a; we form the following table, 


X 

-3 

-2 

■ 

0 

B 

2 

3 

4 

y 

■ 

2 

■ 

8 

16 

32 

64 

128 


from which we obtain the semi-logarithmic graph shown in Figure 30. 

\ We now state the following theorem. 

[> Theorem n. If A is a positive constantj the (a?, log y)-graph of 
I y = is a straight line. 



Proof: Since (15) is linear in x and F, its graph in (x, Y) rectangu¬ 
lar coordinates is a straight line. 

Semi-logarithmic graphs are also called ratio charts. Their useful¬ 
ness depends upon the property of logarithms that 

log ^ = log M-log AT. 

It follows that the distance between any two ordinates of the chart 
measures the ratio between the values represented by these ordinates. 
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Thus if 


then 


Vi V* 

log Vi - log Vi = log y» - log Vi 


or 

Yr-Y2=Yz- 74 , 

that is, equal ratios are represented by equal vertical distances. 
Likewise, if 


then 


Vi y* 


7i - 7a > 73 - 74 


and the larger ratio is represented graphically by the larger distance. 
These differences of elevation are independent of any base line. 
The same percentage increase in y is represented by the same addition 
to the height of 7 in all parts of the chart. Hence, it is easier to 
depict and discover percentage changes on ratio charts than on 
ordinary charts. 

The analysis of time series in economic statistics is often facilitated 
by forming “ link relatives ” which are ratios of each ordinate (after 
the first) to the preceding ordinate. Thus, if y\, yi, • • •, Vn are the 
given values, the link relatives are 


Rx 



Ml 

y2 


• > 


Rn—l 


Vn 

' • 

yn-1 


Any link relative R denotes the percentage change in y from one 
month (say) to the next. If the y's are plotted on ratio paper they 
will lie on a straight line when the iZ's are equal, on a curve bending 
upward when the /2’s are increasing, and on a curve bending down¬ 
ward when the /2^s are decreasing. It follows that if two curves are 
parallel on ratio paper their rate of increase (or decrease) is the same. 

For further discussion of ratio charts the student is referred to the 
books of Bivins and Haskell (see §7, Introduction). 

Graphical determination of exponential function. It follows from 
Theorem II that data giving a straight line when plotted on semi- 
logarithmic paper (with x on the uniform and y on the logarithmic 
scale) satisfy an equation of the form (13). Suppose that the 
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(straight line) graph has been drawn and one desires the exponential 
function which the line represents and the data satisfy. The con¬ 
stants A and B in (13) can be approximated by the following method.^ 
We first observe that the slope of the line represented by (15) is 
given by 


m = B log c 



To determine the numerical value of B, take one cycle of y (over 
which the graph extends) from any starting point and read the cor¬ 
responding values of x (Figure 31o), so that 



(a) (b) 


Fig. 31 


In case the graph does not extend over one cycle, determine x for 
y ^ e and y = 1; then (Figure 316) 

Ax log e Ax 

The sign of B is of course positive if the graph has a positive slope 
in the ordinary sense and is negative for a negative slope. 

If the graph intersects the line x = 0, the value of A can be read 
off at this intersection. If, in the data involved, the graph does not 
intersect the line x = 0, A can usually be determined by finding 2 / 
for some convenient values of x such as Bx = some integer n, where¬ 
upon A = j//e" from equation (13). 

In practical problems, the plotted points representing the data 

on Semi-Loganthmic Graphs — W. T. Lcnser, The American Mathe¬ 
matical Monthly, vol. 49 (1942), pp. 611-613. 
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will not usually fall exactly on a straight line. But if they exhibit 
a linear trend one may draw (with the aid of a transparent ruler) 
the line that seems to fit them best. Then proceed as above. 

Example 9. The uniform scale along the horizontal axis of a sheet of semi- 
logarithmic paper ranges from 0 to 10; along the vertical axis the logarithmic 
scale ranges from 100 to 1000. A straight lino is drawn on the paper from the 
upper endpoint of the vertical scale to the midpoint of the horizontal scale. 
Determine (i) the equation of the exponential function represented ])y the line, 
(ii) the equation of the line in (x, Y) coordinates. 

Solution using above method. A = 1000. B = —1/(5 log c) = (—2.3)/5 
= —0.46. Hence, the desired equation (i) is 


y = 1000e-«*«*. 


The slope of the line isw=Bloge = —i and its equation (ii) is 

7=3- 0.2x. 

Solution 2. The line goes through the points (0, 1000) and (5, 100). Substi¬ 
tution of the first pair of coordinates into (13) gives A = 1000. Substitution 
of the second pair into y = lOOOe^"' gives 100 = lOOOe*^^. Then = 10 and 
—6J5 = log« 10 = 2.303, whence B = —0.46. 

14. Logarithmic Coordinate Paper. A function of the form 

(16) y = " 

is called a 'power function. If A* > 0 we have 

(17) Y = K + mX 

where the capital letters denote the logarithms of the corresponding 
lower-case letters. Form (17) suggests the usefulness of logarithmic 
coordinate paper on which the rulings in both directions are at dis¬ 
tances from the origin that are proportional to the logarithms of 
the numbers represented. To mark on this paper a point whose 
ordinary coordinates are (Xi, Fi) we plot the point whose rulings 
correspond to the numbers x\ and yi. 

It is evident from (17) that the graph of (16) is a straight line on 
logarithmic coordinate paper. It also follows from (17) that the 
problem of fitting a curve of the form (16) to a set of observations 
can be reduced to the problem of fitting a straight line. 

Example 10. A straight line is drawn on logarithmic coordinate paper through 
the points (4, 16) and (6, 54). Determine the function y = /(x) which has that 
line as its graph. 
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Solution 1. Substitution of the coordinates of the given points into (16) gives 

fl6 = fc(4»») 

164 = A:(6«). 

Upon dividing each member of the first equation by the corresponding member 
of the second, we obtain 8/27 = (2/3)’“ whence by inspection m = 3. Then 
A; = i, and the required function is 4^/ = xK 

Solution 2. Substitution of the logarithms of the given coordinates into (17) 
gives 

[ 1.20412 = + 0.60206m 

11.73239 =K + 0.77815m. 

0 

Solving, m = 3 and K = —.60206 = 9.39794 — 10, A: = .25. 

16. Parabolic Trend. Data of broad economic or social signifi¬ 
cance extending over a long period of years may often be described 
by an arc of a second degree parabola. The equation of a parabola 
is of the form 

y = a -f jSr H- yx® 

where a, /3, y are the parameters to be determined. 

If the x’s are equispaced we may let 

X — X 

t =-» 

c 

where x — (xi -h xn)/2 and c = | x<+i — x< |, and thereby effect 
considerable simplification in evaluating the constants. In t and y 
coordinates the equation will, of course, involve different constants 
and we may write its equation in the form 

(18) y = A + Bt-\- Cfi. 

The method of moments may again be used and since (18) is a poly¬ 
nomial this method also gives the best fitting curve in a least squares 
sense. Because there are three constants to be determined we must 
equate the second moments as well as the zeroth and first moments. 
Imposing these conditions of moments between the observed and 
computed ordinates, we obtain the three normal equations: 

'22y==NA + B'Zt + 

^jly = A^2^ d" d" 

'Efiy = + BE^ d- CEl^- 
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Since the mean is chosen as ori^n = 0. With this choice of 
origin and because the x’s are equispaced it can be shown that 
Therefore the normal equations simplify into 

B = 

. = Yt% 

When the summations involved in these equations are evaluated 
from the data the values of il, B, and C can easily be determined. 

Example 11. Fit a parabola to the following data. 


Number op Divorces per 1000 Marriages in the United States 

1900-1930 


Year 

y 

X 

t 

ty 


t>y 


WEM 

81 

0 

~3 

-243 

9 

729 

81 


84 

5 

~2 

-168 

4 

336 

16 

mssM 

88 

10 

-1 

- 88 

1 

88 

1 

1915 

104 

15 

0 

0 

0 

0 

0 


134 

20 

1 

134 

1 

134 

1 

1925 

148 

25 

2 

296 

4 

592 

16 

1930 

170 

30 

3 

510 

9 

1530 

81 

Sums 

809 



441 

28 

3409 

196 




From (19), 


28 

7A + 280 = 809 
7&A + ,1960 = 3409. 


Solving the'last two equations simultaneously we obtain, 


Therefore, 


, 322 

A=-, 


^ 173 

“ 84 ' 


322 441 173 


3 ^ 28 84“ 

I'll we desire the equation in the original form we substitute / = i(® ~ 15) and | 
•obtain 

' 322 . 441 /x - ISN , 173 /x - ISV 
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which simplifies into 

y = 78.62 + .68a; + .0824a;* 

Upon the hypothesis that divorces will continue to increase according to this 
trend, we may estimate the number for 1950 for example. When a; = 50 in 
the above equation, we find y = 318.62. 

16. The Gompertz Curve. The curve which bears his name was 
suggested in 1825 by Gompertz for use in actuarial science. Recently 
it has had some application as a growth curve in business and popula- 
lation forecasting and in certain problems in education. Its equa¬ 
tion^ is 

(20) y = kg^. 


To determine the parameters, we first transform (20) into the loga¬ 
rithmic form 

(20a) Y ^K + Gc^ 


where F = log if = log k, G - log g. The number, N, of obser¬ 
vations available must be such that N = 3n where n is the number in 
each of three subgroups with no observations omitted; that is, N 
must be divided into three blocks of data consisting of n items each. 
It is also necessary that the values of the independent variable x be 
equispaced. Then the origin can be chosen so that x takes the 
values 0, 1, 2, • • •, 3n — 1. If these values of x are substituted in 
(20a) wejobtain the three sets of functional F's shown in (a), (6), 
and (c). 


0 Yo ] 


n — 1 

Yn-i 

n 

Yn 

n + 1 

r„+, 

2n — \ 

Y 2»_1 

2n 

Yu 

2n+l 


3» — 1 

Yu-1 


E 

»=0 




2n~l 


E F, 

% *=n 


3n-l 



Yo = K + Gd> 

Yi = K + Ge ^ 

Yn-l = K + , 

7„ = X + (?c» ' 

y2»-i = K + Gc^’>-\ 

F2„ = X + G'c*» ’ 
F 2n+ 1 ~ K + 1 

K3«-x = K + G<^-\ 


(a) 


(b) 


(c) 


^For a derivation see Mathsmalicdl Theory (tf Life Inaurance — Forsyth. 
John Wiley and Sons, Inc. 
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Let /Si, S 2 , Ss denote respectively the totals of the subgroups (a), 
(b), and (c). Thus we have 

jSi = nK + G*(l + c + • • • + c"~0 

5 2 = n/C + Gc"(l + c H-+c»-‘) 

5 3 = nK + Gc*’‘(l + c + • • • + 

Then 

S 2 - /Si = (?(c» - !)(! + c + • • • + c»-‘) 

3s — Si = Gc"(c" — 1)(1 + c + • • • + c""*) 

whence we obtain 


Writing the expression for S 2 — St in the form 


S2-St = G 


(c" — 1 )^ 
c — 1 


and solving for G, we obtain 


(S 2 - St)(c - 1) 
(o’* - 1)2 


The expression for Si may be written 


Si = nK + 


G(l - c") 
1 — c 


} 


so we have 



G (1 - c")") 

1 -c J' 


In the above expressions, Si, S*, S 3 denote sums of the fimctional 
y’s. If these are now replaced by the empirical data so that 

n—1 2n—1 Sn — l 

St = E I".-/ S2= E n 83= E Yi, 

0 n 2n 

where F< refers to the observed T’s, then c can be determined 
from the expression for c". Using the value of c, G can be deter¬ 
mined, and then K. 

If c < 1 , it is clear from (20o) that Y —* K as ®—» «. Then 
y = A" is an asymptote and k is sometimes called the ceiling of the 
curve. (See Figure 32.) 

For an application of the above method to a problem in business, 
see Statistical Methods (Revised Edition) by Mills, page 672. 
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17. Remarks and References. The methods of least squares and 
moments do not select the appropriate curve. They merely deter¬ 
mine the best ” values of the parameters in the equation of the 
curve which has been selected previously to describe the observed 
data. The question of the type of curve which should be fitted to the 
data is not always easy to answer. The selection of the appropriate 
mathematical function depends to a large extent upon the investiga¬ 
tor’s experience in the field in which the problem lies and his knowl- 
edge^of the properties of curves. It always helps to plot the data first.'^ 
The usual requirements for practical purposes are that (a) the curve 
must represent well the trend of the empirical data, and (b) the 
mathematical expression must not involve too many parameters and 
those present must be calculable from the data. In dealing with 
time series, if the objective is to find out what would happen if the 
percentage change should continue as it has on the average in the past, 
then an exponential trend is indicated. If the objective is to find out 
what would happen if the yearly (or monthly, etc.) change should 
continue as it has in the past, a straight line trend is indicated. 

We will merely mention here two other important curves which 
require more advanced mathematics in their treatment. The logistiCj 
or so-called Reed-Pearl curve, is used extensively in studying various 
growth phenomena. Its function is of the form 

^ a + be* 

and it resembles somewhat the Gompertz curve discussed above. 
For further discussion of this curve and methods of fitting it see 

1. Elements of Statistics — Davis and Nelson. 

2. Statistical Methods, Revised — Mills. 
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The function y = 

is known as Makeham’s law. It is used in actuarial work. The stu¬ 
dent having a working knowledge of the calculus will find an inter¬ 
esting discussion of its use in the field of insurance in an article en¬ 
titled Makeham^s Laws of Mortality^ Rietz, American Mathematical 
Monthly, vol. 28, p. 471. 

The logistic curve was used in studies on the rate of growth of the 
population of the United States. But its usefulness in this connec¬ 
tion fell somewhat short, apparently,^ of the claims of its sponsors. 
Two other references relating to the population of our country may 
appropriately be mentioned here. Although they do not involve 
problems of curve fitting they do afford instructive examples of the 
application of scientific method to social and political problems. 
They are 

1. Bibliography on Methods of Apportionment in Congress — E. V. Huntington. 
American Mathematical Monthly^ vol. 49 (1942), pp. 115-117. 

2. Determination of the Center of Population in the United Stoles, School 
Science and Malhematics, May and June, 1942. 

Exercises 

1. If the rate of change of y with respect to x is always proportional to the 

attained value of y then y is what kind of a function of x? 

2. Determine A and B in the best fitting curve of the type (13) for the following 

data. 

Data Form for Computations 

X y t Y tY 

0 KXK) 

5 100 

10 10 

15 1 

20 .1 

3. (a) Prove formula (11). 

(6) Graph the curve y = 10e~**. 

4. Find the best fitting parabola for the following points: (—4,2), (0,8), (4, 9), 

(8, 11), (12, 8), (16, 5). Ans. y = 7.2 + Mx - .07x\ 

^ Differential Equations Subject to Errors and Population Estimates — Harold 
Hotelling. Jour. Amer. Slat. Assoc., vol. 22 (1927), pp. 283-314. 
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5. If the values of t form an arithmetic progression and =*= 0 prove that 

= 0. 

6. (a) Add the values x = 30, 2/ = 37 to the data of Example 6 and find the 

trend line. Am, ^ - .8a; -f 10.43. 

(6) On the hypothesis that the apparent trend continues, predict the value 
of y when a; = 35. 

7. In a tensile test of a metal bar the following observations were made, where 

X represents the load in tons and y the elongation in ten-thousandths of 
an inch: 


X 

1 

2 

3 

4 

5 

y 

14 

27 

40 

55 

68 


Determine a linear relation between x and y by the theory of least squares. 

8 . In the following table y represents the fire losses in the United States in 
millions of dollars. Taking the origin of x at 1915 find the best fitting 
line, in a least squares sense, for the data. 


X 

1915 

1917 

1919 

1921 

1923 

1925 

y 

172 

290 

321 

495 

535 

570 


9. (a) Add the values a; = 6, y = 300 to the data of Example 7 (p. 153) and find 
the equation of the best fitting exponential curve. 

Am, Y = .4617a; - .2534 
y = .56c^‘*^. 

(6) Plot the given data and the curve obtained in (a) on semi-log paper. 

10. Distinguish between the forms of the curves represented by the functions 

y = and y = Ke~^^ where A, B, K, and h are positive real num¬ 

bers. If these functions were plotted on semi-log paper what kind of 
curves would be obtained? 

11. Determine by inspection the value of (a) 10*®*io*, (6) 

12. Solve for a;; logio(a;*) = (Iogioa;)(logea;). 

13. Solve for x: logio(a;*) — logio(a;/10) = 2. 

14. Determine a number x such that the square of log x exceeds log x by 2. 

(Logs to base 10. Two answers.) 

16. On semi-logarithmic coordinate paper, a straight line is drawn through the 
points (2, 1) and (4, 100). Determine the function which has that line 
as its graph. Hint. Use the form y = Ar*. Am, lOOy = 10*. 

16. Same as exercise 15 for the points (1, 6) and (2, 18). Am. y » 2(3*). 
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17. On logarithmic coordinate paper, a straight line is drawn through the points 

(2, 12) and (3, 27). Determine the function which has that line as its 
graph. Ans, y = 

18. Data from a certain experiment involving voltage (y) as a function of time 

(t) are plotted on logarithmic coordinate paper, and are found to exhibit 
a linear trend there. A line is drawn, with a transparent ruler, which seems 
to fit the plotted data best. Two points on this line are (6, 18) and (8, 
32). Determine an equation expressing v in terms of t whose logarithmic 
graph is the line. 

19. Draw the graph of y = 25a:" on logarithmic coordinate paper, (a) when n = 

2, (b) when n = —2. Mark scales clearly. 

20. The graph of y — logic x assists one in remembering several important 

prop)erties of the logarithms of real numbers. Sketch this graph and 
state some of these properties. 

21. Read and report on one or more of the references cited in §17. 

Note. Source material for additional exercises on curve fitting 
may be found in the current volumes of the following publications: 

1. Statistical Abstract of the United States. 

2. World Almanac and Book of Facts. 



CHAPTER VIII 
CORRELATION THEORY 

1. The Meaning of Simple Correlation. So far we have been 
concerned with the problems which arise from variation in a single 
variable. We will now consider the simultaneous variation of two 
variables. Methods for disclosing the facts of co-variation and for 
measuring the degree of relationship existing between two variables 
are due mainly to the English biometricians Sir Francis Galton 
(1822-1911) and Karl Pearson (1857-1936). 

Data presenting two sets of related measurements or observations 
may arise in many fields of activity yielding N pairs of corresponding 
variates (a:», yi), i = 1, 2, 3, • • •, N. Thus x may represent July rain¬ 
fall and y the average yield of com in a certain section; x may be 
an index of commodity prices and y an index of employment over 
the same period; we may be interested in a group of school children 
in which x is their height and y their weight, or x may refer to their 
reading ability and y to their spelling ability; we may be studying the 
chance distributions which are obtained in throwing two dice where 
X is the number obtained in throws of a single die and y is the number 
obtained in throws of the two dice together. 

Example 1. In the following set of selected heights (inches), x = st4tiire of 
father, y = stature of son. 


X 

69 

70 

69 

68 

70 

73 

69 

67 

69 

64 

y 

68 

69 

72 

67 

70 

71 

72 

66 

71 

65 


ExamfU 2. (Snedecor,) Thfe following data on twelve trees are adapted from 
the results of an experiment to test the phenomenon that the injury by codling 
moth larvae seems to be greatest on apple trees bearing a small crop. Here 
X « hundreds of fruit on a tree, y = percentage of fruits wormy. 



15 

15 

12 

26 

18 

12 

8 

38 

26 

19 

29 

22 


52 

46 

38 

37 

37 

37 

34 

25 

22 

22 

20 

14 
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Fig. 33 


When the given pairs of values are represented by dots locating 
the points whose rectangular coordinates are {x, y) we obtain a scj^ 
called ‘‘ scatter diagram ” (Figure 33). The problem is to determine 
the degree of association, or correlation as it is called, between the 
x^s and the corresponding since this indicates the significance of 
the relationship. 

The field of correlation may be thought of as bounded on the one 
extreme by perfect functional dependence and on the other extreme 
by complete independence in the probability sense. For example, 

the pairs of values which satisfy the 
equation 2 / = 2x — 5 do not present 
a statistical problem. In this case the 
relationship is defined by a mathe¬ 
matical function y = f(x). Similarly, 

_^ at the other extreme we would not be 

concerned with pairs of values which 
are completely independent in the 
probability sense, as, for example, the 
grades of students in statistics and the heights of their fathers. Two 
variables are said to be statistically related when they lie between 
these two extremes of relationship. 

The theory of correlation is concerned with a twofold problem: 
first with measuring the indicated relationship, and secondly with 
predicting or estimating the average value of y associated with a 
designated value of x. 

2. The Coefficient of Correlation. It is fairly obvious from Figure 
33 that with values of x in an assigned interval Ax (Ax small) the 
'■corresponding values of y differ considerably. There is said to be^ 
- positive correlation if, for an assigned x larger than x, the mean of the 
corresponding y values is larger than and, for values of x smaller 
than X, the mean of the corresponding values of y is less than y. 
On the other hand, as x increases the tendency may be for y to de¬ 
crease. In this case, for an assigned x larger than x the mean of the 
corresponding y values is less than y, and for an assigned x less than 
X the mean of the corresponding y’s is greater than y. There is then 
said to be negative correlation. If, for an assigned x taken at ran¬ 
dom a corresponding y is no more likely to be above than below p, the 
variables are independent in the statistical or probability sense and 
there is said to be zero correlation between them. 

When the variables are correlated there is a tendency for the dots 
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in the scatter diagram to fall into a sort of band having a fairly defi¬ 
nite trend. We are assuming that this trend is linear, and a theory 
built upon this assumption is known as simple or linear correlation. 

In Figure 34 the origin of the x'y'-axes is taken at (J, y). Then 
the points of the scatter diagram are distributed over the four quad¬ 
rants of the aj'y'-plane. 



The coordinates of the points in the four quadrants have algebraic 
signs as follows. In quadrant 

% 

I, x' and y' are positive; 

II, x' is negative and is positive; 

III, x^ and y' are negative; 

IV, x' is positive and y^ is negative. 

Therefore, the product x'j/' is positive for all dots which occur in 
quadrants I and III and negative for all’dots in quadrants II and IV. 
The algebraic sum of all such products describes the distribution of 
the dots over the quadrants. When this sum is positive the trend 
of the dots is through quadrants III and I, when it is negative the 
trend is through II and IV, and when zero there is no trend, the dots 
being equally distributed over the four quadrants in the sense that 
the positive products of x*y* balance the negative products. 'Con¬ 
sequently, a natural measure pf correlation would be obtained by 
summing the products x'y' for all the observed values and taking the 
.average by dividing the result by Moreover, if we first express 
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X* and 2 /' in units of their respective standard deviations we obtain a 
measure of correlation which is independent of the original units. 
This measure is universally denoted by r. Thus we have in symbols, 



It is variously called the total correlation^ the product-moment co¬ 
efficient of correlation^ and the correlation coefficient. 

We may give the following word definition: 

Definition. The correlation coefficient of two sets of variates ex¬ 
pressed in their respective standard deviations as units^ is the arith¬ 
metic mean of the products of deviations of corresponding values from^ 
their respective means. 

^ 3. Other Formulas for r. Although formula (1) is very useful for 
giving the meaning of the correlation coefficient, other formulas 
easily obtained from (1) are usually much better adapted to numeri¬ 
cal computation. Since a-x and cry are constants •(!) may be written 
as 

-x){y - y) 

(2) . r = ^ - 

CFx<ry 

It is useful to think of this as 

co-variance 

T = -- 

[(variance of x) (variance of 


Formula (2) reduces to 
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Theorem I. The value of r is independent of the origin of reference 
'and the units of measurement. 

Proof: Let 

a: — xo y - 2/0 

h ' "~ k ' 

Then 

X = uh + Xo, y = vk + yo, <Tx = hou, Oy = ka^. 


Substituting in (2) we obtain 


(4) 


i . '* 


ft c'' 


(4a) 




T = 


(TvU’v 


[1 r 1 

^ — CM ’ “ 1 ^ 53*'* ~ * 


Since (4) and (4a) are independent of the constants Xo, yo, h, and i*, 
the theorem is proved. 

This property is of fundamental importance. It means that the units of 
measurement for the two sets of observed quantities can be chosen indepen¬ 
dently of each other. If the two sets of quantities are of the same kind, the 
units need not be the same in both cases; and, what is more important, if the 
quantities are of different kinds, so that the units are not comparable at all, the 
coefficient r nevertheless may have a definite meaning. (Of course the value of 
the coefficient will be affected by a change in the method of measurement of one 
of the quantities, such as the substitution of an area for a length in estimating 
the size of an object, or the assignment of different relative weights to the ques¬ 
tions on an examination.) 

The pairs (a;^, y<) may be all distinct or there may be repetitions among them. 
But it is necessary to impose the condition that neither nor yi shall be con¬ 
stant throughout. This condition is imposed to insure that the denominator 
shall not vanish in the various formulas for r. The Algebra of Correlation — 
Dunham Jackson. American Mathematical Monthly, vol. 31 (1924), pp. 110<- 
121 . 

When the given values of x and y are large and a computing machine 
is not available, the computations may be lightened by an appropriate 
choice of these constants. If only the origin of reference's changed, 
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then h = h — and u = x — xo,v = y — yo. If the means are taken 
as the origin of reference by letting x' = x — H and y' = y — y, 
then x' = y' = 0 and the formula becomes, 


(5) 


N 




r = 


[iHllE..]” 


A subscript notation should be attached to r when there are several 
series of variates. Thus, Vxv for the {Xy y) series, r^z for the (x, z) 
series, for the series denoted by (a^i, a;^), etc. 

Example 3. To illustrate the formulas we will compute the value of r for the 
following data. Here x == Brokers' Loans in billions of dollars and y = The 
AnnalisCs index of the prices of fifty rail and industrial stocks in 1929. We choose 
u = X — 5.00 and v = y — 250. 


Month 

X 

y 

u 

V 

uv 



J 

5.33 

248 

.33 

-2 

-0.66 

.1089 

4 

F 

5.67 

248 

.67 

-2 

-1.34 

.4489/' 

4 

M 

5.65 

243 

1 .65 

-7 

-4.55 

.4225 

49 

A 

5.56 

249 

.56 

-1 

-.56 

.3136 

1 

M 

5.53 

235 

.53 

-15 

-7.95 

.2809 

225 

J 

5.28 

265 

.28 

15 

4.20 

.0784 

225 

J 

5.77 

282 

.77 

32 

24.64 

.5929 

1024 

A 

6.02 

303 


53 

54.06 

1.0404 

2809 

8 

6.35 

290 

1.35 

40 

54.00 

1.8225 

1600 

0 

6.80 

230 


-20 

-36.00 

3.2400 

400 

N 

4.88 

201 

-.12 

-49 

5.88 

.0144 

2401 

D 

3.45 

206 

-1.55 

-44 

68.20 

2.4025 

1936 

Sums 

■ 

6.29 

0 

159.92 

10.7659 

10678 

1 « 

— Sums 

■ 

.5242 

0 

13.3267 

.8972, 

889.8333 


CompiUations: Cu = [.8972 - (.5242^'^ = .79 

av « [889.8333p'a » 29.83. 


From (4a) we have, 


13.3267 
(29.83) (.79) 
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Experienced computers use calculating machines to great advan¬ 
tage in large-scale computational studies. The following reference is 
recommended to students who expect to engage in such work: “The 
Calculation of Correlation Coefficients from Ui^ouped Data”— 
P. S. Dwyer, Journal of the American Statistical Association, vol. 35 
(1940), pp. 671-673. 

Exercises 

1. When X* and y* represent deviations from the means, 

(а) Show from (1) that = Nr<rx(ry. 

(б) Show that 

2. Derive formula (3) from (2). 

8« Show that (3) may be written as 

^_ N'Exy - __ 

[{ATj:** - (i:*)*} {N'Lv^ - 

4. Find r for the data of Example 1. 

6 . Find r for the data of Example 2. 

6. The following data represent the ages of husband (x) and wife {y) of twenty 
couples. Find r using (5). Arts. 0.856. 


X 

22 

24 

26 

nj 


27 

28 

28 

29 

30 

30 

30 

31 


33 

34 

35 

35 

36 

37' 

y 

18 


g| 

24 

22 

24 

27 

24 

21 

25 

29 


27 

27 

30 

27 

30 
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O In studying a set of pairs of related variates, a statistician has completed the 
preliminary arithmetic and obtained the following results: 

N = 100; Z** = 1,585,000; Z« = 12,500; Z*J/ = 1,007,425; Z»* = 
648,100; Zv = 8,000. Find 2, y, <r„ <r», r. 

8 . The table in Exercise 2, page 97, contains the grades made on two tests by 
twenty-five students in mathematics. Find r for these data. Ans, 0.786. 

9. Suggest examples of negative correlation. 

10. In the following anthropometric measurements on a random sample of 
twenty male freshmen, taken from the Physical Education Department, 


X 

y 

z 

X 

y 

z 

68.5 

33.6 

148 

65.3 

33.0 

136 

67.2 

35.0 

144 

65.1 

34.0 

144 

67.7 

30.2 

145 

64.8 

37.3 

170 

63.8 

30.0 

108 

69.6 

33.4 

154 

69.9 

33.0 

130 

68.2 

31.5 

122 

64.7 

31.0 

112 

68.8 

32.0 

141 

68.4 

33.0 

134 

72.3 

35.0 

159 

66.4 

30.2 

112 

67.8 

33.7 

134 

69.1 

33.3 

143 

71.3 

31.6 

136 

71.0 

32.3 

136 

63.5 

33.6 

126 
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X represents height, y represents chest measurement, both measurements 
being taken to the nearest tenth of an inch, and z represents weight to the 
nearest pound. Find the coefficient of correlation (a) between x and y, 
(6) between x and 2 , (c) between y and z. 


4. Regression. The properties of r can be studied by fitting a 
line to the scatter diagram in such a way as to make the sum of the 
squares of the vertical distances from the points to the line a mini¬ 
mum. 

When such a line is referred to the point (x, y) as origin, we have 
seen (§9, Chapter VII) that its equation is y' = mix' where 


mi 




and x' = X — X, y' = y — y. This value of mi may easily be ex¬ 
pressed in terms of r and the standard deviations, as follows: 


mi 


Nrcyffx 

iVv** 


r 


<T„ 


<Tx 


Therefore, the equation of our line, referred to a system of axes whose 
origin is at the means of the variates, is 



This is called the regression line of y on x. The term regression 
was used first by Galton in studying inheritance of stature. He 
found that offspring of abnormally tall or short parents tend to 
“ step back ” or “ regress " to the ordinary population height. 
However, as now used, regression line has no reference to biometry, 
but is merely a convenient term. 

By fitting a line x' — ■my' to the points of the scatter diagram in 
such a way that the sum of the squares of the horizontal distances 
from the points to the line shall be a minimum, it is possible to de¬ 
duce a second regression line (the regression line of x on y) whose 
equation, referred to (x, y), is 

(7) a/ - —ry'. 

(Ty 

.Note that (7) cannot be obtained by solving for x' in (6). The 
'two regression lines will coincide if, and only if, r = ±1. From 
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»the equations of the regression lines it is evident that if r > 0, an 
increase in the one variable tends to accompany an increase in the 
other; if r < 0, an increase in the one will be accompanied by a 
decrease in the other. 

Equations (6) and (7) are usually expressed in terms of the 
original variables x and y instead of the deviations x' and y'. It is 
obvious that they may be written as 


( 8 ) 

and 

( 9 ) 


Ux 



• ) 


when referred to the origin of x and 

Equation (8) may be used to estimate values of y corresponding 
to designated values of x. Similarly, from equation (9) we may 
estimate x for designated values of y. It would be appropriate to 
use (8) as a predicting equation when the variation in y is caused or 
controlled by the variation in x\ (9) would be used when the varia¬ 
tion in x is caused or controlled by the variation in y. 

The quantity mi = r{(Tyl<Jx) is called the regression coefficient of y 
on Xy being the variation in y corresponding to a unit change in x. 
Likewise, m 2 = r(<r»/(ry) is called the regression coefficient of x on y. 
Thus the^numerical value of r is given by (mim 2 )^^^ but its sign must 
be that which is common to the two regression coefficients. The fol¬ 
lowing quotation from Snedecor (reference 13, list p. 6) sheds light 
on the distinction between regression and correlation. 

The point of interest he|ej s that r is the geome^c ra^ of the two xegresinon 
coefficients . In ordinary units measurement, therefore, r is an average of the 
two regression coefficients used iQ (i) estimating y from x and (ii) estimating 
X from y. This serves to clarify the relation of the two coefficients, correlation 
and regression, in measuring relationship. The latler js the appropriate one if 
one variable, y, may be designated as dependen^Kpte jbther^ x. Values of y 
may be partly controlled or caused by x, as whsul^^lable amounts of some 
glandular secretion cause differences in the jaMHK qrjg^msms. Or, y may be; 
subsequent to x, as weight gain in nutriti<H;^|ra|^^^ the measurement ^ 

of initial wei^t. In such cases, the regr^i8si«^^ bn x.^^usually the statistic 
that fumii^es the information desired. It a|iprb|ilHate to attempt to 

estimate the value of y from a knowledge of the cotresfidnding value of x. Cor¬ 
relation, on the other hand, is the appropriate measure o^riie relation between 



Sec. 5 


The Standard Error of Estimate 


179 


two variates like statures of husband and wife. The two heights are known to 
be associated through some complex of social and biological causes, but neither 
may be looked upon as a consequence of the other. In this sense correlation 
is a two-way average of relationship, 
while regression is directional. Of course, 
there are many variables whose relation¬ 
ship may be studied by means of either 
correlation or regression, or both. It is 
necessary only to keep clearly in mind 
the character of the relation being con¬ 
sidered. 

Geometrically, mi is the slope of line (8) and l/m 2 is the slope of 
line (9). The two lines intersect at (x, y). 



Exercises 


1. Derive the equation of the line of regression of a; on ?/ as suggested above. 

2. Find the equations of both lines of regression for Exercise 6 (page 176), and 

plot them. Ans. y = .888x — .64-^ 

X == .8252/ -f 8.55. 

3. Using the appropriate equation, find the estimated values of y corresponding 

to the given values of x, for Exercise 6 (page 176). 

'^fc'^Given the following results for the heights and weights of 1(X)0 men students: 
y = 68.00 in., x = 150.00 lbs., r = .60, 
ay = 2.50 in., ax = 20.00 lbs. 

John Doe weighs 200 lbs., Richard Roe is five feet tall. 



Estimate the height of Doe from his weight, and the weight of Roe from 
his height. 

Ans, Doe’s height = 71.75 in. 

Roe’s weight = 111.6 lbs. 

(o) Given the following: 

Y.X = 150,000, = 22,726,000, = 10,622,600, 

Y.y = 70,000, = 4,936,000, N = 1000. ^ 


Find 35, ay, r, and the lines of regression. 

(6) Suppose the data in (o) refer to the weight in pounds (x) and the height 
in inches (y) of a sample of 1000 policemen. Suppose Paul Private weighs 
160 pounds and Saul Sergeant is 6 feet tall. Estimate the height of 
Private and the weight of Sergeant. 


6. The Standard Error of Estimate. The average concentration 
of the points around the regression line of y on x may be measured 

by the expression ^ where d is the difference [between an ob¬ 
served y and the y obtained from the regression line. The value of 
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~ will be denoted by Sy\ and Sy is called the standard deviation 

of the errors of estimate, or more briefly the standard error of estimate. 
The errors of estimate are the deviations of the observed values of y 
from the corresponding estimated Or to describe them another 
way, they are the deviations of the sample z/’s from the assumed 
population 2 /^s. It can be shown that Sy^ = ory^(l — r^). To prove 
this we may write the sum of the squares of the deviations in the 
form: ^ 

W = i:(y'- - 2r-"i:a:y + 

\ <rx / <r» Ox 

= Noy^ - 2i\rr V + = Noy\l - r*). 

Hence, we have 

(10) - r*) . 

and ^ 

(10a) Sy = oy{l - r^yi\ 

An analogous consideration of the differences between the x’s and 
the regression line of a; on gives for the square of the standard 
error of estimate of the x’s 


(11) Sx^ = Ox^{l - r*). 

6. Properties of the Correlation Coefficient and Standard Error 
of Estimate. Certain properties of r may now be deduced. It is 
obvious from (10) that |rl < 1 because both the left member and v 
ffy* are positive or zero. Therefore, 

-1 < r < 1. 

If the points all lie exactly on the regression line, the left member of 
(10) vanishes and r = ±1. There is then said to be perfect linear 
correlation, since the relation between x and y is given exactly by a 
linear function. A large numerical value of r means that the regres¬ 
sion lines are close to coincidence and the points in a scatter difl, gra.m 
cluster closely around the regression lines. 

li^en the regression lines (8) and (9) are expressed in standard 
units, they become respectively 

( 12 ) ty = rtx 

and 
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(13) 

W 

tx “ 7*ty or ty ~ tx 


where 

r 


- 

' (..*-* aad 

<Tx (Ty 



In this form we see at once that as one variable increases, the other 
variable ty increases (or decreases) to an extent that depends upon r. 
Thus r measures co-variation in the variables when they are ex- 
pressed'in comparable units and when regression is linear. 

In standard units, r is the slope of line (12) and 1/r is the slope of 
line (13). When r = 0 , the regression equations become ty ~ 0 and 
<* = 0 in standard units or y — y and 
X ^ X m the original units. These are 
also the equations of the coordinate 
axes. Therefore, when r = 0 the re¬ 
gression lines are perpendicular to each 
other and coincide with the tx and ty 
axes, (when r == 1 the regression equa¬ 
tions become identical and the two lines 
coincide in quadrants I and III. Simi¬ 
larly, when r = — 1 they coincide in quadrants II and IV. In each 
case the coincident lines bisect the quadrants if the equations are 
expressed in standard units, but not otherwise unless (Ty = o-x. The 
angle 6 between the' regression lines varies from 0° to 90® as r varies 
from one to zeroj 

When there is no correlation between x and y then r == 0, and the 
variables are said to be independent in the statistical sense. On the 
other hand, when r = 0, it is not necessarily true that the variables 
are statistically independent. Indeed there may be a hi gh correla¬ 
tion i w4 . h nnTHiilinfiar = 0. (Non-linear regression 

wITTie considered in §21.) Incidentally, the phrase “ independent 
variables ’’ in the statistical sense should not be confused with the 
phrase “ independent variables'' which is used in the ordinary sense 
of analysis to designate the variables on which a specified function 
depends. However, the two usages, though quite distinct, are not 
fundamentally contradictory, since functional dependence can be 
regarded as a limiting case of statistical dependence. 

^ See H. L. Rietz, On Functional IleUUima for which the CoeffidenJt of Correlation 
is Zero, Journal American Statistical Association, vol. 16, 1919, pp. 472-476. 
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For an appreciation of the use of S„ in passing judgment upon the 
precision to be expected in estimating values of y by means of the 
regression equation of y on x, it is instructive to consider simulta¬ 
neously the meanings of (8) and (lOo) as |r| varies from 0 to 1. When 
r = 0, (8) becomes y — y which means that the best estimate of y 
for any value of x is the mean of the y-distribution. In other words, 
knowledge of x is of no value in predicting y. When r = 0 in (lOo), 
\S„ = This is to be expected since the dispersion Sy about the 
/line y = yis the same as the dispersion ay of the given y’s about their 
/ mean. But as |r| increases from 0 to 1, Sy decreases from ay to 0. 
Graphically, the meaning of this improvement in Sy in comparison 



Fig. 36 — For a Fixed Value op o-y, Sy Decreases in Proportion 
TO (1 — AS r Increases 

with (Ty, as r increases, is shown in Figure 36 where parallel lines are 
drawn at a vertical distance of Sy on either side of the regression line 
RR\ For a given value of |r| 9 ^ 0 this strip encloses the average dis¬ 
persion about the line. The strip on either side of 2 / = y at a dis¬ 
tance of (Ty from it encloses the average dispersion about the line when 
r = 0. As \r\ increases from 0, the line rotates from the horizontal 
position of y == y to the terminal position it would have when |r| = 1, 
and at the same time Sy decreases toward 0. Formula (10a) tells 
us that as \r\ thus increases, Sy decreases from cry in proportion to 
(1 - 

A similar analysis could be made concerning the line of regression 
(9) of a; on 2 / which rotates from the vertical position x — x wjien 
|r| « 0 to meet and coincide with line (8) when \r\ = 1. As line (9) 
rotates, Sx decreases from a* to 0 in proportion to (1 — as 
r increases. 
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As |rl —* 1, (12) and (13) rotate toward each other at equal an gular 
velocities. When they are coincident their slope is ±1. T.inAg (8) 
and (9) rotate at angular velocities which 
are proportional to mi = tan a and m^ = 
tan j8, respectively, when mi and mj are 
defined in §4J Their slope at coincidence 
is ±(r„/<r,. For line (12) it can be shown 
that 

(14) 

where 8 is the diflference between an ob¬ 
served value of ty and the ordinate ob¬ 
tained from (12) for the corresponding value of 4. Thus, 

= 1 — 2r2 -f- 
= 1 — 7.2 



This result would also be apparent from the derivation of (10) since 
8 = d/ffy where d refers to residuals in units other than standard 
units. 

It is obvious from (14) that the maximum value of ~ ^8^ is unity. 
Therefore, adopting 
( 15 ) 


as a measure of goodness of fit, we see from (14) and (16) that r* 
is a measure of the goodness of fit of (12) to the points of the scatter 
Hiagram expressed in standard units. By an analogous argument a 
mmilnr conclusion concerning (13) can be made. 

, 7. Fiirther Discussion. Given a set of iST pairs of z and y cor¬ 
related values. Suppose the necessary constants are evaluated to 
obtain the regression equation (8). Then if the given values of x 
are substituted in this equation, a set of estimated y% say sy, will be 



184 


Correlation Theory 


vm 


obtiuned. The mean, siS, of these estimated is the same aa.the 
mean of the observed y '&. The prc -of it. as follows. From (8) we have 

Bty = y + r— {x - $). 

<r» 

Then 

1 iV ff 1 ^ 

^ Zy + ^ £(*< - 25). 

iV 1 O’* .tV 1 

N 

But — =0 by Theorem VI, Chapter III. So Jy — ’y. 

1 

We now state the following theorem. 

. Theorem II. The variance, asy, of the estimated y's equals r*<ry*. 
Proof: By definition, 

<^Ey^ = ^(eyi — By)^. 

From the above discussion, — «p) is the same as (y< — y) which 
is given by (8). So 

Hence 

(16) asy^ = rVy*. •, 

From this theorem and (10) we obtain 

(17) Sy* = O-y* - 0r*y*. 

This relation helps to clarify the meaning of r and of Sy. It is con¬ 
ventional to call (Tay* the variance in y which can be explained from 
knowledge of x) that is, which the regression of y on x accounts for. 
(In the language of some writers, agy^ measures the variation of 
regression about the mean.) Therefore, (17) shows that /Sy* is the 
variation in y after the accompanying variation in x is duly dis¬ 
counted. Sy^ is sometimes called the residual variance because it 
measures the variation in the dependent variable y which knowledge 
dt X fails to accoimt for. This relation can be depicted geometri¬ 
cally by the sides of a right triangle. To standardize the representa¬ 
tion we can take Vy = 1 as the diameter of a semicircle within which 
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is inscribed the right triangle, as in Figure 37. In the figure, cos 0 = 
So from (16) we have cos 0 = r. The particular values of 
0 in the figure, found from a table of cosines, are ^ = 36° 52' when 
r = .8, and 0 = 25° 5(y when r = .9. When r = 1, then asu = <fv 
and the regression of j/ on a: accounts for all the variation in y. 



Theorem III. The ' i Uvr. v olmriTcJ and estimated values 

of y is the same an that Uttaren the observed values oj x utui y. 

Proof: We are to show that 

^ Uvey - ysy 

(TEy<Ty 

reduces to one of the formulas for r. Substituting the values for 
eV^ eVj (TEy into the above expression and simplifying, we obtain (3). 
The details of the proof are left to the student as an exercise. 

8, Coefficient of Alienation. A measure of the failure to improve 
estimates of y from knowledge of correlation is given by 

(18) A- = (1 - r2)i/2. 

It is sometimes called the coefficient of alienation. Incidentally, it 
is interesting to observe that the functional relation between k and r 
is shown, graphically, by a semicircle of unit radius, z.e., 

/(r) = (1 - r^yi\ 

The formula 

fr' = 1 ~ (1 - r^)m 

may be called the improvement factor because it shows the decrease 
in Sy/ny as |r| increases. It is clear that 

= = ^ and^' = l-fr. 

N (Ty^ 
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Table 31 gives^ values of h and fr' for values of r. With no knowl¬ 
edge of correlation, the best estimate of an individual y is y. Values 
of k' for assigned r’s show how much better than this guess is the 
estimate of an individual y value with knowledge of correlation. 
For example, when r = .5 the column headed k in Table 31 shows 
that the standard error Sy is about 87% of Oy. Or, from the k' 
column, Sy has been reduced only 13% from what it would have 
been if y had been used for prediction purposes. The third column 
thus shows how the prediction value of r varies with r. Thus as jrj 
decreases from 1 to .8, Sy/vy increases from 0 to 60%. Or from 
another point of view, as |r| increases from 0 to .8, the error of 
estimate is improved by only 40%. A correlation of r = .9 permits 
prediction of individual y’a only 66% better than a mere guess based 
on the mean. 

It is fairly obvious that we cannot, with any considerable degree 
of reliability, predict from ordinary values of r an individual y for an 
assigned x. However, with a large N, we can give a very reliable 
prediction of the mean of y values that correspond to an assigned 
value of X. This can best be explained from a correlation table 
which is used when N is large and which will be explained in the 
next section. 

Tablb 31 — Values op p and the Cobbespondino Values op k and ft' 


r 

k 

k' 

.1 

.995 


.2 

.980 


.3 

.954 


A 

.917 


.5 

.866 

.134 

.6 

.800 

.200 

.7 

.714 

.286 

.8 

.600 

.400 

.9 

.436 

.564 

.92 

,392 

.608 

.94 

.341 

.659 

.96 

.280 

,720 

.98 

.198 

.811 

1.00 

0.000 

1.000 


I Constructed from a table of sines and cosines. Letting r » cos 0, sin 0 » 
(l-coB‘0)‘/*-(l-r*)‘/*. 
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Exercises 




Given the following correlated data: 


X 

8 

6 

4 

7 

5 

y 

9 

8 

5 

6 

2 


(a) Compute the correlation coefficient. 

(h) Find the regression line of y on x. 

(c) Find the estimated values of y corresponding to the given values of x. 

(d) Compute the standard error Sy of predictions in two different ways. 
Ans. 


2.4 „„ „ 

r = —/ r7=: = .69, nil = = 1.2, 

V2V6 V2 


Sy = Vsli = 1.76. 


Note. In practical work, it is never worth while calculating a correla¬ 
tion coefficient for so few observations. These fictitious data are given 
solely as an exercise on which the student can test his knowledge of the 
methodology. 

2 . Prove that the ratio of variance of the estimated y's (taken about their 

mean) to the variance of the given y's is equal to r*. 

3 . If Sy^/<ry^ = 1 — r2 is the percentage of the total variance of y uncontrolled 

by knowledge of x, what is the remaining percentage, determined by or 
calculable from knowledge of xl 

4. What equation is the equivalent mathematical statement for the following 

words? 

If the respective deviations in each scries, x and y, from their means 
were expressed in units of standard deviations — that is, if each were 
divided by the standard deviation of the series to which it belongs — and 
plotted to a scale of sUndard deviations, the slope of a straight line best 
describing the plotted ix)ints would be the correlation coefficient r. 

6. Given the standard deviations v, and ay of two distributions of correlated 
variates: 

(а) What is the standard error in estimating y from x if r = 0? 

(б) By how much is Sy in (a) reduced if r is increased to .25? 

(c) How large must r be in order that Sy be one-half as large as in (a)? 

(d) What must r be in order that Sy be reduced to one-third its value in (a)? 

(e) At what value of r is Sy reduced to zero? 

(/) For any value of r, what is the ratio between the standard error of 
estimating y from x and the standard deviation of the ^/-distribution? 

6. Evaluate the following statements: 

(а) A correlation coefficient less than zero indicates an absence of linear 
relationship. 

(б) A correlation coefficient of r = .6 indicates twice as close relationship 
as a coefficient of r = .3. 
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vX Kail the points lie exactly on the regression line of y on show that » 0 
and hence that r = ±1. 

8* Show that may be computed by means of the relation 


NSy> = 




9 . 


where the primes denote deviations from the means. 

(For analytics students.) Show that the tangent of the angle from line (8) 
to line (9) is 


tan 6 


tTx(Ty 


4- 

and from line (12) to line (13) is 



1 - r* 
tan 6 = —-— 

2r 

What is the value of e when r = 1; when r = 0? 

10 . The least-squares criterion of best fit requires that ^6® be a minimum, 
where 6 is the distance between the line and a point. Three cases arise 
depending on whether 
Case /, b is measured parallel to the 2 /-axis, 

Case //, b is measured parallel to the a>axis, 

Case Illy b is measured perpendicular to the lino. 

We have seen that Case I yields line (12) and that Case II yields line (13). 
In Case III the line has no universally accepted name but it may be called 
the ** geometrically best-fitting line.” 

(For calculus students.) For Case III prove the following; 

(a) In standard units, the equation of the line is 

ty = tx if r > 0 
and ty = —if r < 0. 


Solution. Let the equation of the required line be 


Then by analytics, 


ty = mix + 


J_ y* .2_L V* ^ 4 fc — 

VTmT* ) 

+ 1 - 2mr 


1 + m* 


To make this a minimum, first put A:* = 0. Call the result /(m). 


/(»») 

/'(m) 


4-1 — 2mr 
1 4" 

2mV - 2r 
(1 + m*)* ’ 
4mr(3 — m*) 


Then 


rim) 


(1 4- m»)» 
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The second derivative will be positive when m and r have the same sign. 
Since /(m) is a minimum when m = ±1, we are to take w = 1 when 
r > 0 and m = — 1 when r < 0. 

(6) If r = 0, all lines (for which = 0) fit equally well. Hint. If r = 0, 
f(rn) = 1. 

(^) * 1 — |r|. Hint. What is the value of f(m) when m = ±1? 

Note that |r| = if r > 0 and |r| = — r, if r < 0. 

(d) Goodness of fit is measured by |r|. 

(e) When r = .6 the fit is twice as good as when r = .3. 

11. The following query and answer appeared in Biometrics Bvlletin^ vol. 1, 
no. 3, pp. 36--37. “ Research assignment: Investigate the references 

cited in the answer and justify the procedure which is recommended 
(under the given hypothesis). 

Query. A problem that has bothered me is the fitting of regression 
lines when their position is restricted in some way. For example, suppose 
a test is made of the relationship between the number of fish caught in a 
body of water and the average number which can be caught out of it, with 
a standard amount of fishing. In fitting a regression line to such data, 
we know that the point (0, 0) must fall on the line, since if no fish are 
present certainly none will be caught. In other words, we have one 
point which is free from sampling error. The unique importance of this 
point will, it seems to me, make observations in its neighborhood of rela¬ 
tively less importance than observations at a distance from it, where 
there is no fixed guide-post. Do you know of any treatment of situa¬ 
tions of this sort, by which the best straight (or curved) line could be 
fitted to data where there is one point which must be satisfied? The 
standard deviation from regression (“standard error of estimate’^) and 
the standard error of the regression would also be available. Or are these 
concepts pertinent in such a question? 

Answer. Deming (§15 and §11 of reference 4) gives both a general 
method and some particular solutions of your problem. Snedecor (refer¬ 
ence 6) opens his Chapter 6 with an illustration of the simple case in 
which X is measured without error and the variance of y is constant for 
all values of x . 

Observations in the neighborhood of (0, 0) may or may not be of less 
importance than those at greater distances; it depends on the variance 
of y. One often finds that this variance increases with x. In fact, there 
are many situations in which it seems reasonable to suppose that in the 
sampled population the standard deviation of y is directly proportional 
to X. If you think this hypothesis is suitable in your fishing, the appro¬ 
priate method is to calculate the ratios x/y where x is the number of fish 
caught and y is the total number of fish, then apply to them the statisti¬ 
cal procedure suitable for a single variate. — George W. Snedecor. 

9. Correlation Table. When the sample to be studied is large, 
it is more convenient to replace the scatter diagram by a correlation 
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table. We may divide the a^-plane into rectangles of convenient 
size, and all points of the scatter diagram falling within any rectangle 
are thought of as being concentrated at the center of this rectangle. 
A number is then written within the rectangle to designate the 
number of points at its center. A correlation table is therefore a 
two-way frequency table exhibiting the frequencies in each class 
interval. 


Table 32 




65 - 
69 

70- 

74 

75- 

79 

80- 

84 

85- 

89 

90- 

94 

95- 

99 



X 

67 

72 

77 

82 

87 

92 

97 

f(y) 

90-94 

92 




1 

2 

3 

1 

7 

85-89 

87 



1 

3 

8 

1 

5 

18 

80-84 

82 

4 

4 

6 

4 

9 

1 


28 

75-79 

77 

3 

3 

7 

6 

4 



23 

70-74 

72 

2 

3 

5 

6 

1 

1 


18 

65-69 

67 

3 

2 






5 

60-64 

62 

1 







1 


/(*) 

13 

12 

19 

20 

24 

6 

6 

100 


Suppose Table 32 is constructed in this way for a set of average 
daily grades (z) and final examination grades (y) of 100 students. 
When the data have been thus grouped into classes, the class marks 
are regarded as the variate values. Thus in Table 32 there are 9 
students whose daily grades are 87 and whose final examination grades 
are 82. The last column labeled f(y) represents the distribution 
of y variates and the last row labeled/(a:) represents the distribution 
of X variates. A correlation table is thus a bivariate distribution. 
In Table 32 the width of the class interval is the same for x and y, 
but of course this is not generally the case. 

10. Notation. In order to compute r from a correlation table it 
will be necessary to develop new notation. Since we are now dealing 
with frequencies in both the x-direction and the y-direction, we will 
distinguish between them by /(*) and /(y). To be sure, this has 
the disadvantage of being the same symbol as that for function, but 
from the context no ambiguity should arise. 
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Generalizing, a correlation table is of the following form: 


X 


l/\ 


«2 

as 

— 

“1“ 



Xn 

m 

,v) 

^SS/(*,y) 

* y 

Vn 










Vn-L 




fi^.v 

) 




S/(j 

X 

1 










■ ^1 





GfV) 





1 

i 









Vi 


1 








m 


1 

i _ i 






in 


2S/(x,v) 

yx 


The rectangles containing the frequencies are called cells. The 
frequency in a typical cell is denoted by/(a;, y), meaning the frequency 
in the cell whose coordinates are x and y, where x and y are the 
mid-values of the class intervals. Both columns and rows are sub¬ 
distributions of the total frequency N. Each column is a frequency 
distribution of y’a corresponding to a mid-a: value. Similarly, each 
row is a frequency distribution corresponding to a mid-y value. 
The sum along any row is denoted by y), being the sum of 

the frequencies in the (x, y) cells in the x-direction. Since the 
marginal total for any row is the total frequency corresponding to 
a given value of y, it is therefore written in the column headed /(y). 
Thus, in Table 32, for y = 92, 

HKx, y) = J2f(x, 92) = 1 -h 2 + 3 + 1 = 7. 

X X 

Similarly, ^fix, y) denotes a summation in the y-direction of all the 

V 

entries in a column, corresponding to a fixed value of x, so it denotes 
an entiy in the bottom row which contains the f(x) frequencies. 
Thus, for X = 67 

'£m, y)=4-f3-f-2-|-3 + l = 13. 
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Summarizing, 

(19) 'Ef(.x,y) = f(3f)i J^f(x,y) =f(x). 

With regard to N, we may obtain it from a correlation table in 
three ways? (1) by adding the entries across the rows and then 
totaling the resulting sums in the marginal column labeled f(y); 
(2) by adding the entries along the columns and then totaling the 
results in the marginal row labeled f{x); (3) by adding the entries 
in the cells in any order whatsoever.! Hence, the following notation, 

(20) ZlS/(®. y) = y) = y) = 

V X X y XtV 


will denote, respectively, the above-named procedures or orders in 
summing. From (19) and (20) we have 


(21) N = 2/(2/) = 2I/(«) = y)- 

V X x,y 


We may call fix) and fiy) the marginal distributions of x and y, 
respectively. A correlation table with cell frequencies /(x, y) 
uniquely determines the marginal totals fix) and fiy). The con¬ 
verse, however, is false. For example, we might replace the four 
cell frequencies in the upper right-hand comer of Table 32 by the cell 


frequencies 


2 2 
2 4 


without disturbing the marginal totals. 


11. Means and Variances. We will now express the means in 
terms of this notation, taking first the mean of x^^. From the funda¬ 
mental definition, we must multiply each x by its corresponding 
frequency in the cells and sum the results, taking the products in any 
oMer whatsoever. Hence, 


1 

2 = ^2^(»>2/)- 
iV x,v^ 

Thisi may also be written 


2 ^ y'> 2»Z/(»» y'> S^(»)- 

Observe that the x may be moved to the left of 2 “ second 

y 

expression because x is treated as a constant in a summation per¬ 
formed with respect to y. 
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Similarly, we have, < - 

5 = ^ Hyfix, y) = ^ 'LUvfix, y) 

= ^ Z2/Z/(». y) ^yfiy)- 

The student will observe that the last expression for the mean in each 
case is identical with that given for a frequency distribution of one 
variable, when allowance is made for the necessity of distinguishing 
between variables. , 

Any column is an x array of y^s, so the symbol yx is appropriate 
for the mean of a column. Similarly, Xy denotes the mean of a y 
array of x’s, i.e., of a row. We may now state the following theorem.^ 

Theorem IV. The mean y for the whole table (in the y-direction) 
is equal to the mean of the values yx for the several columns when each yx 
is weighted with the frequency in that column. 

Proof: We are required to show that 

= y 

where 

V* = j^'Eyf(.x,y). 

f\^) y 

Upon substituting in the first equation the value of as given by the 
second equation, we have 

inHyM y) = y) = y- 

It is suggested that the student state and prove a similar theorem 
concerning x. 

In this new notation, the definitions of the variances becomes 

^ x,y 

= - 2®; 

Is X 

1 This is actually the same as Theorem IX on page 46, but it seems worth- 
whj|le to state and prove it in the new notation. 
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ff** = iz(2/ - yyf(x,y) 

•tV x,y 

= ^Hy‘‘f(.y) - 5*- 


Exercises 

1. Evaluate the following expressions in Table 32. 
(o) For a: = 82, 

5Z/(*. y), HvSix, y), fix), 

V u 


(6) For y = 87, 

V), Hxfix, y). 


5- 


f(y), *»• 


2. Refer to Table 27 (Chapter V) and let x be the number of a column. Express 
the answers in the third and second lines from the bottom of the table in 
terms of the notation of this section. Thus for x = 1, 

Hz = 2/) = ^185 + (75)2 + (65)2 + (55)2] = 67.86. 

fix) y 7 

12. Computation of Means. Just as in the case of a one-way 
frequency distribution it was found convenient to choose an arbi¬ 
trary origin and take the class interval as the unit, so we now do 
likewise. Let 


( 22 ) 

Hence, 

(23) 

where 


Likewise, let 

(24) 

whence 

(25) 


u = - (x - xo); 

ti 


i.e.y X = uh + Xo. 


X = uh Xo 




v = -^(lf-yo); i-e.. 


y = hk + yo, 


y = vk + yo, 




» - 


where 
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Then a suitable form for computing the means of the a:’s and j/’s 
is as follows: 



u 

-3 

~2 

-1 

B 

B 

B 

m 

fly) 

vf(v) 

V' 


'67 

72 

77 

82 

B 

92 

97 

fTv) 

3 

92 




B 

m 

m 

■ 

7 

21 

2 




1 

3 

8 

1 

5 

18 

36 

1 

82 

4 

. 4 . 

.6 

4 

m 

■ 


28 

28 

0 

77 

3 

3 

7 

m 

■ 



23 

0 

-1 

72 

2 

3 

5 

m 

■ 

■ 


18 

-18 

-2 

67 

3 

2 







-10 

-3 

62 

1 







1 

1 

u> 



12 

_1 



m 

B 

B 

100 

54 

WBnSSSM 


ESI 

SI 

D 


m 

B 

-28 



Computatiom: 


a=^i:«/(«)=^=-. 28 , 

whence 

2 = 82 + 5(-.28) = 80.6. 
0 = ^ E''/(«') = -54, 

whence 


5 = 77 + 5 (.54) = 79.7. 

In the table/(v) = f(y) and/(w) == f{x) because u and v are merely different ways 
of describing the cells but in no way change the frequencies in those cells. 


13. Computation of r. In the expressions of §10 and §11 the 
(w, v) coordinates could have been used instead of (x, y). The use of 
the former simplifies the computation of r. A preliminary discussion 
of certain expressions will help in understanding the formula for r 
to be used for a correlation table. Let us consider first the following 
expression: 

(a) ^uvf(u, v). 

U, V 

This means: multiply the/ in each cell by the u and v coordinates of 
that cell and add the results, proceeding from cell to cell over the 
whole table in any order whatsoever. But it may be more con- 
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venient to proceed in a definite order, say down the columns. Then 

(а) becomes 

(б) v) = *')• 

U V U V 

The expression v) in the right member of (6) means: for 

V 

any u (i.e., for any column), multiply each / by its own v and add 
the results. Let us denote this sum by V. Then the right member 
of (b) means: multiply the V for each column by the u of that 
column and add the results, proceeding from column to column 
(i.e,, summing in the ^-direction). We may also obtain the same 
result as in (a) by proceeding along the rows. Thus (a) may be 
written 

(c) mjuvfiu, v) v). 

9 U ^ 9 U 

The expression ^uf(u, v) means: for any v (i.e., for any row), 

u 

multiply each / in the row by its own u and add the results. If we 
call this sum C/, then the right member of (c) means: multiply 
the U for each row by the v for that row and add the results, pro¬ 
ceeding from row to row (i.e., summing in the v-direction). 

We are now ready to derive the formula for r. 

Since we are now dealing with a frequency distribution, the funda¬ 
mental definition of r becomes 

- x)iy - y)f(x, y) 

(26) r = - 

From (22) and (23), we have 

(x — 2) = h{u — iZ), 


and from (24) and (25), 

(l/ - y) = h{v - »). 

Since (x, y) and (u, v) are merely different notations for the same 
cell, we have 

/(«, y) = /(«! *')• 

For computing purposes, the standard deviations are defined as 
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follows: 



Therefore, (26) becomes 


- E)(v - i))f(u, v) 
r = --- - - ' • 

<Tu(^v 

If now we let 

V = ^ == Y,vf{u, v), 

U V 

then since 

Xlw/(w, v) = J^vJ^ufCu, v) = *’)> 

1*, O , V U U V 

the above expression for r may be written in either of the following 


ways: 


(29) 

r = .— 

(TuCv 


O’uO'v 

The fact that 

ZvU = J^uV 

V u 

serves as a check in the table. 

The above procedure is illustrated in Table 35. 

Explanation: The table is self-explanatory except possibly the U and V entries. 
Recalling that U = ^be first entry in the U column is obtained from 

u 

the sum of the following products: Od + 1*2 + 2*3 4-3*1 =11; the second 
entry from —M + 0*3 + 1*8 + 2«1 -f 3*6 = 24. Since V = v) the first 
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Table 35 — Computation op r for Data op Table 32 



it 

-3 

-2 

B 

B 

B 

2 

3 

/(») 

vf(v) 

»*/(») 

V 

vV 

V 


67 

72 

77 

82 

87 

92 

97 

3 




1 

2 

3 

1 

7 

21 

63 

11 

33 

2 

87 



1 

3 

8 

1 

5 

18 

36 

72 

24 

48 

1 

82 

n 

D 

6 

4 

9 

B 

B 

28 

28 

28 

-15 

-15 

0 

77 

3 

3 

7 

6 

B 

B 

B 

23 

B 

B 

-18 

0 

-1 

72 

2 

3 

5 

6 

B 

B 

B 

18 

B 

18 

-14 

14 

-2 

67 

3 

2 



B 

B 

B 

5 

-10 

20 

-13 

26 

-3 

62 

1 







1 

-3 

9 

-3 

9 

/( m ) 

13 

12 

19 

20 

24 

6 

6 

100 

54 

210 


@ 

«/( u ) 

-39 

-24 

-19 

0 

24 

12 

18 

-28 

/ 

«*/(«) 

117 

48 

19 

0 

24 

24 

54 

286 

B! 

-7 

-3 

3 

7 

30 

11 

13 



21 

6 

-3 

0 

30 

22 

39 

@ 


entry in the V row is obtained from 1-4 + 0*3 H—1-2 H—2*3 H—3*1 
Similarly, for the other entries. 

. Compviations: 

- M’ = 2.86 - (-.28)* 

= 2.7816. 

<r„ = V2.7816 = 1.67. 

«r,« = ^ - 5* = 2.10 - (.54)* 

= 1.8084. 


<r, = \/l.8084 = 1.34. 
Therefore from (29) we have 

1.16 - (-.28)(.54) 


r as 


(1.67) (1.34) 


= 0.58. 


-7. 
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14. Remarks on Computation of r. (a) Sign of r. It should be 
observed that the sign of r depends on the choice of the positive di¬ 
rection along each coordinate axis. In Table 35 the origin of refer¬ 
ence is chosen so that the data occur in the first quadrant and the 
directions on the {Xy 2 /)-axes are the conventional ones. These 
directions were preserved in changing to (w, v) coordinates. If we 
had reversed the direction of the 2 ;-axis by labeling the y values 
larger than y = 17 hy v = —1, —2, —3, and those less than y = 77 
by V = 1, 2, 3, the sign of r would be changed. But if the directions 
of both u and v were reversed, the sign of r would be unchanged. 

(6) Grouping errors. When N is small, say less than 100, and 
the data are grouped into cells, grouping errors are introduced. In 
general, the fewer cells used, the greater the errors. These may be 
corrected, in part, by applying Sheppard’s corrections to (Tu and 
(Tv. However, this will not be insisted upon in this course. 

(c) Commercial charts. Computations can be expedited by the 
use of commercially prepared correlation charts. Several types of 
chart are available on the market. In her book (reference 15), 
Professor Helen M. Walker explains the merits of two of these which 
are recommended. She also gives the following advice to beginners: 

A chart is not a crutch to help the novice. It is a means of speed¬ 
ing up operations after they are well understood.” 

Exercises 

1. By equation (29), show that r is indeiwndont of the choice of origin and of 

the units of measurement. 

2. In Table 35, evaluate the following sums: 

IZ/(w. 2), 11/(2. v), 1), v), S5Z»/(«. »). v) 

u V u V U V u,» 

7^ E/(w. w) if w = 0- 

J(v) u 

3. Derive (29). 

4. For the table on page 200, find r and 55, y, o-*, try. Note that Xot 2/o, h 

and k, do not need to be determined to compute r, but are required 

for the means and standard deviation of x and y. 

16. Regression Lines for a Correlation Table. The data of a 
correlation table may be thought of as dots lying many deep at the 
centers of the several cells. There are, of course, f(Xy y) of these in 
any cell whose coordinates are (x, y)y and/(x) is the total number of 
dots in a vertical column whose coordinate is x. Suppose now we 
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Heights and Weights of 200 Freshmen 
(Heights to Nearest tHr Inch; Weights to Nearest i Pound) 


X 

90- 

99.5 

1 

1 

120- 



I 

160- 



190- 

200- 

209.5 

/(») 

7ft- 

77.9 




1 









1 

1 

74- 


1 





1 

1 

1 

1 



4 

72- 




1 

1 

1 

4 


1 




8 

70- 



1 

2 

6 

7 

6 

2 

1 

2 

1 

1 

29 

68- 



2 

8 

17 

8 

9 

2 

1 

1 

1 

1 

49 

66- 



8 

16 

14 

13 

6 

2 

1 



1 

61 

64- 


3 

8 

7 

7 

3 

3 

1 

1 




33 

62- 

1 

4 

1 

7 

1 


■ 






14 



■ 





I 






0 

B&- 

69l9 


1 





1 






1 

/(*) 

1 

8 


42 

46 1 

32 

29 

8 

6 

4 j 

2 

2 

200 


Arts. 2 = 138.45 lbs.; y = 67.82 in. 
cTa B 19.6 lbs.; (Ty s 2.8 in. 
r « 0.48. ^ 


replace all the data in each column by an equal number of data con¬ 
centrated at the mean of that column. If we denote the ordinate of 
this mean point by y*, we have 

( 30 ) = -^'Lvfix, y). ^ 

Hence, ptf(x) represents the totality of all the values in a column. 

For each of th6 colunms there will be a value of (30). Taking the 
h}q)othesis that the mean points of the several columns lie approxi¬ 
mately on a straight line y, = mix -|- k, we may find nti and k under a 
least-squares criterion of approximation. If, in applyin^the criterion. 
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the square of the difference between the observed mean, o5x, and the 
computed mean, cVx, for each array, viz,, — rriix — ky, is weighted 
with the number/(x) in the array, it turns out that we get the same 
values for rrti and k which we obtained when we fitted the regression 
line oi y on X to the scatter diagram. 

In proving this, the student of calculus^ would have an easy task 
in obtaining the normal equations: 

Y^{yx - rriiX - k)f{x) = 0 
(31) ^ A 

- miX - k)xS{x) = 0 

X 

whose simultaneous solution yields the desired values of wii and h. 
Expanding (31), we have 

— rn^xf(x) — k'^fix) = 0 

X XX 

J^ViXfix) - mi'^xJix) — k'^xfix) = 0 . 

X XX 

TiVxfix) = y) = ^y> 

X X y 

= Y.xYjyKx, y) = YjXyKx, y), 

X X y x^y 

) becomes 

Ny — rriiNx — Nk = 0 
^xyf{x, y) — m^x^fix) — kNx = 0. 

. x,y X 

Solving (33) for mi and k we find 
k = y — triix 

'^xyfix, y) - Nxy 

X,y _ 

^x^fix) — Nx^ ffx 

X 

1 Differentiating partially ^f(x)(yx — vzix — A;)* with respect to m and k 
respectively, and setting the results equal to zero, yields equation (31). Instead 
of differentiating this expression one may expand it, regard the result as 
a quadratic in both m and k, and use the theorem of §3, Chapter VII, to 
obtain (31). 


(32) 

Since 

and 


equation (32 
(33) 
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and the equation of our line becomes 

y, = r—a: + y-rx, 

X (^X 

that is 

(8a) — 5 = r ~ (a: — x). 

(Tx 

Therefore, the best-fitting line for the means of the columns prop¬ 
erly weighted, and the best-fitting line for all the dots are one and 
the same straight line. But from the point of view of a correlation 
table, a regression line is to be regarded as the equation from which 
may be estimated the average of all the ?/^s associated with a particular 
value of X, In other words, a prediction in the latter case professes 
to give only the mean result (Figure 38). 



Fig. 38 — The Line op Regression of y on o; is the best Fitting Line for 
THE Means op the Columns 

16. Applications. The data of a correlation table are usually re¬ 
garded as a sample of the much larger class of similar data consti¬ 
tuting the universe. A regression equation calculated from a limited 
but representative sample may give valuable estimates of the average 
values of y in the universe associated with designated values of x. 

Let us consider the data of Table 36 on page 203. Suppose a 
personnel manager in charge of hiring employees of a manufacturing 
plant has instituted a system of mental tests for applicants, and has 
gathered these data showing the relationship between the standing 
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made by applicants on their mental tests and their productive ability 
when measured according to a certain standard of production after 
they are hired. 

Table 36 ^ 



X 

22.5 

27.5 

32.5 

37.5 

42.5 

47.5 

52.5 

57.5 

f(y) 

fiv) 

Xy 



-4 

-3 

-2 

-1 

0 

1 

2 

3 

125 

4 





2 

3 

2 


1 

47.5 

115 

3 



1 

3 

1 

4 

4 

4 

17 

48.1 

,105 

2 



5 

7 

8 

11 

8 

7 

46 

45.9 

95 

1 


2 

1 

10 

12 

9 

8 

2 

44 

44.0 

85 

0 

1 

3 

12 

11 

7 

12 

7 

1 

54 

40.7 

75 

-1 

2 

1 

5 

6 

16 

8 

5 


43 

41.6 

65 

-2 

2 

5 

5 

8 

8 

6 

1 


35 

38;0 

55 

-3 

2 

3 1 

3 

4 

1 1 

1 



14 

33.2 

/(»)=/(«) 

7 

14 

32 

49 

55 

54 

35 

14 

260 


Vx 

67.9 

72.1 

81.9 

84.8 

85.7 

90.9 

95.6 

105.0 




Here X represents the grade made on mental test, and y the per cent of standard 
in production. (See also Table 27.) The means of columns are denoted by y*, 
and the means of rows Xy. 

In order to demonstrate to the company's management the con¬ 
nection between his mental tests and the productivity of the em¬ 
ployees he has hired, the personnel manager does the following: 

(1) Computes the coefficient of correlation between the two series; 

(2) Shows what the estimated productivity of employees would be 
whose grades in the mental test fell on the mid-points of the class 
intervals of the mental test data. 

The means of the columns and of the rows are given in the table. 
In addition, he obtains the following results: 

X = 42.17, = 17.41, r = .417, 

5 = 87.31, (r*= 8.40, mi = r- = .864. 

C X 

Therefore, the line of regression of y on a: is 

5, - 87.31 = .864(a: - 42.17) 
or 


(34) 


5x = .864x + 60.88. 
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This is the equation of the line that best fits the points which desig¬ 
nate the means of the columns (Figure 39). Hence, for an assigned 
value of re, equation (34) gives the value of y which is the expected 
mean of the column defined by the assigned value of x. The personnel 
manager is thus prepared to predict the productivity of applicants 
on the basis of their mental test grades. In other words, the regres¬ 
sion equation calculated from the records of those already hired may 
be used in selecting from future applicants those most likely to 
succeed.^ 



Fig. 39 — Means of Columns and Line of Regression 
OF 2/ ON a? FOB Table 36 

Exercises 

1. Verify the value of r given for Table 36. 

2 . Verify the means of the columns given in Table 36. 

3 . Using equation (34) show what the estimated productivity of employees 

in the factory referred to above would be whose mental test grades were 
22.6, 27.5, etc. 

4 . For Table 35, 

(a) Find the equations of the regression lines. 

' The critical reader may doubt if the value r =» .417 is sufficiently large to 
warrant much confidence in (34) as a predicting equation. The question of 
reliability of predictions is discussed later. 
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(6) Locate the axes through the mean of the table and graph the regression 
lines. 

(c) Compute Sy. 

5, As in Exercise 4 for the table on page 200. 

Ans. to (a), 

5* = .069x + 58.3 
Zy = 3.362/ - 89.4. 

17. Sy for a Correlation Table. For ungroiiped data we have 
defined Sy as a measure of the clustering of the data around the 
regression line, and have observed that it is called the standard error 
of estimate. In order to understand what Sy has to do with “ esti¬ 
mates ” it is necessary first to consider its meaning in a correlation 
table. Let us denote by Sy.x the standard error about the regression 
line in the array of i/s at x. Thus we have 

(35) Sy.x^ ~ cyx)^f(Xj y) 

where oy denotes an observed y value and cVx denotes the value 
obtained from the regression line for that (;olumn. Thus, for the 
column headed 32.5 in Table 36 we obtain the computed value 
yx by substituting x = 32.5 in (34) whence we find yx = 78.96. 
To evaluate Sy.x^ for this column we find the square of the deviation 
of each of the 32 values of oy from 78.96, add the results and divide 
by 32. J]xtracting the square root of the result we find Sy.x = 15.96. 
Moving along the regression line suppose we have computed an Sy.x'^ 
for each array of i/s and averaged the results. It is interesting to 
learn that this average is Sy^. This is stated more precisely in the 
following theorem. 

Theorem V. The arithmetic mean of the values of Sy.x^ for the several 
/columns when eaxh Sy.x^ is weighted with the frequency in that column is 
Sy^ = ay^l - r2). 

Proof: Using (35) we have 

TrE/(*)Vx* = hiyUiaV - cyxys{x,y). 

Jy X ^ X y 

Substituting the value given by (8o), §15, in the right member of the 
above identity we have 
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that is 

- 2^) *“ - ^)|/fey) 

which reduces to <Ty^(l — r^). It is left as an exercise for the student 
to show this. 

For Table 36 we find Sy = 15.83. In Figure 40 the parallel lines 

on either side of the regression 
line RR' are drawn at a vertical 
distance of zhSy from it. They 
describe the average limits of 
scatter above and below the re¬ 
gression line. 

To connect Sy with the reli¬ 
ability of predictions it is neces- 
^ sary to introduce the concept of 
a correlation surface. Indeed, 
a knowledge of the fundamental 
properties of a correlation sur¬ 
face is desirable for a wider outlook on correlation theory in general. 

18. Normal Correlation Siuface. A correlation table may be 
' idealized.ui.tp a surface in somewhat the same way that a histogram 
is idealized into a frequency curve. The concept of a surface relates 
to the universe from which the observed data of the table may be 
regarded as a sample. Let the dimensions of the cells of a table be 
Ax and Ay, and suppose columns are erected upon these cells with 
altitudes proportional to the frequencies in the cells. The result is 
a sort of solid histogram. Then as Aa; —»0, Ay 0, A oo, the 
tops of the columns approach as a limit a smooth surface which is 
called a correlation surface. Our discussion will be confined to the 
case where we may assume that this limit is a normal correlation 
surface. In discussing this surface it is convenient to let x and y 
represent deviations from the respective means and to let z = f{xy y) 
denote the frequency function representing the surface. Such a 
surface is shown in Figure 41. 

Any section of this surface parallel to the yz-plane is a normal 
curve and represents the distribution in a column at x. Similarly 
any section parallel to the xz-plane representing a row is a normal 
curve. The frequency in a cell is measured by that portion of the 
volume under the surface which lies over that cell. All those cells 
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in which the frequency is a fixed value lie on an ellipse. That is, if 
contour lines are drawn on the surface joining the points of equal 
height above the base they will be ellipses. In other words, sections 
of the surface parallel to the a; 2 /-plane are ellipses. 



Fig. 41 — Frequency Surface for Correlated Variables 

We will digress here for a brief discussion of an ellipse. We may 
think of an ellipse as a transitional figure between a circle and a 
straight line, as the circle flattens out. That is to say, the limiting 
form of an ellipse is a circle at 
one extreme of the flattening 
process and a straight line seg¬ 
ment at the other extreme. 

The degree of flatness is called 
the eccentricity of the ellipse, 
and it is proved in analytic 
geometry that the eccentricity 
varies from zero in the case of 
a circle to unity when the ellipse yiq. 42 

degenerates into a line. All 

ellipses having the same eccentricity whatever their size have the 
same relative proportions and are therefore similar in form. 

The eccentricity of the elliptical contoui’s of different normal cor- 
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relation surfaces varies with the amount of correlation existing in 
the corresponding universe. A surface with narrow elliptical con¬ 
tours represents a universe in which there is high correlation, whereas 
if the variables are completely independent in the probability sense 
the contour lines are circles when the variables are expressed in 
standard units. If the variables are not expressed in standard units 
(and r = 0) then the contour lines may be ellipses but their major 
and minor axes will coincide with the x- and 2 /-axes as in Figure 42. 
When r 7*^0 the axes of the ellipses make an angle with the a^y-axes, 
their major axis cuts quadrants I and III in the a:!/-plane if r > 0 (as 
in Figure 41) and quadrants II and IV if r < 0. 

19. Properties of Normal Bivariate Surface. The equation of a 
normal correlation surface is given by 



2tL = AT (27rcrx<ri, V 1 — r^), and x and y represent the correlated 
variables referred to their respective means as origin. 

By means of (36) an observed distribution may be fitted with the 
appropriate normal surface assuming that the sample might reason¬ 
ably have come from such a universe. This is accomplished by 
replacing o'®, (Ty, r, and N in (36) by the corresponding statistics 
calculated from the sample and taking the origin at the mean of the 
table. Let us assume that ^n observed distribution has been gradu¬ 
ated by such a surface and the theoretical cell frequencies obtained. 
The surface extends to infinity in the a; 2 /“plane but contour ellipses 
can be obtained which will enclose any desired percentage of the 
given frequency when these ellipses are projected orthogonally onto 
the xtz-plane. They are all concentric, similar, and similarly placed. 
Figure 43 represents such an ellipse, say the smallest one necessary 
to enclose all the given cells. The systems of perpendicular chords 
represent the columns and rows of the table. 

The graduated frequencies for each column are normal distri¬ 
butions whose means lie on the regression line of y on a; and whose 
standard deviations are in each case given by Sy = <ry(l — 

To state the same thing in a slightly different way, an array of y’s 
corresponding to a fixed value of a; is a normal distribution whose 
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mean deviates from y by r{<7yl<T^x\ and whose standard deviation is 
Sy = (7y(l — which is independent of x\ and therefore is the 
same for all such arrays. Similarly an array of x’s corresponding to 
a particular value y\ of 2 / is a normal distribution with a mean which 
deviates from x by r{Gj(yy)yi, and a standard deviation of Sx = 
(Txil — which is independent of yi and therefore is the same 
for all such arrays. A careful study of Figure 41 will help in under¬ 
standing what is meant by these statements. 

When the means yx of the columns fall exactly on the regression 
line, Sy.x becomes the standard deviation of a column and is therefore 
the same as Sy, Theorem V states 
that Sy^ is an average of the values 
of Sy.x^ but when all the quantities 
being averaged have the same 
value, as they do in the ideal case 
of the normal surface, their (mean) 
average is that value. When the 
standard deviations of the columns 
are equal, the regression system 
of 2 / on a; is called a homoscedastic 
system. In a universe where they 
are not equal the system is said to 
be heteroscedastic. For a homo¬ 
scedastic system with linear regression, Sy = <ry(l — is the 
standard deviation of each array of y^s, 

20, Reliability of Predictions. In using a regression equation to 
make predictions we are naturally interested in the degree of con¬ 
fidence to be expected in the predictions thus made. The use of Sy 
in this connection is based upon the properties of the normal cor¬ 
relation surface. 

Let us imagine the universe of which Table 36 is a sample and 
assume that it may be described by a normal surface. Confining 
our attention to a section parallel to the 2 / 2 -pIane in Figure 41 we 
know that an x array of y's is distributed normally about a value of 
y determined by a designated value of x in the regression equation 
of y on X. That is, the mean of this normal distribution is the 
predicted value of y and its standard deviation is Sy, The per¬ 
centage distribution of such an array is the same as that given in 
Figure 23 of Chapter VI, if Sy is taken as the unit of measurement 
along the horizontal axis. But an estimate of Sy is its value cal- 
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culated from the sample. Moreover, for an observed distribution, 
we have seen that Sy is the average standard deviation of the several 
columns and therefore it may reasonably be taken as an approxi¬ 
mation to the theoretical Sy which in the universe is the same for 
all the columns. We also take the calculated regression equation 
as an approximation to the theoretical. 

By measuring deviations from the predicted value in terms of 
Sy in the same way that a is used as a unit in measuring deviations 
from the mean, we may then enter a normal probability scale for 

the probability of a deviation 
involving multiples of Sy, Ac¬ 
cording to this scale the prob¬ 
ability Py is about .68 for a 
deviation of zkSy from the pre¬ 
dicted value, and the chances 
are even for a deviation of 
.6745 aSv on either side of the 
predicted value. 

For Table 36 we have found 
Sy = 15.83 and for an applicant 
making x = 32.5 on the mental 
test we have predicted y = 
78.96. Therefore the chances 
are about 68 in 100 that his 
percentage of productivity will 
be between 78.96 — 15.83 and 
78.96 + 15.83, that is, between 63.13 and 94.79. In other words, 
the probability is about .68 that the predicted value will not be in 
error by more than 15.83. 

To summarize, in a normal bivariate universe each array is a 
normal distribution and therefore its mean coincides with its mode. 
Since regression is linear, a value predicted from the regression equa¬ 
tion of y on a; is the mean value of y for a designated value of x. 

Then, ^ J* 4>(f) dt is the probability for a deviation from the 

predicted value of 5* as small as \t\ where t is expressed in units of the 
standard error Sy of a column. Thus, 



Fig. 44 — Representing an x Array of 
y*Q AND Deviations of zkSy from a 
Predicted Value of y 


y -Vx 

Sy 
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Then 1 — P, is the probability for a deviation as large as 1<|. Simi¬ 
larly, when dealing with the regression line of x on y, P* = 

the probability for a deviation from the predicted value as Hmnll 
as 1<1, where now t — {x — Xy)ISx. 

Exercises 

1. Refer to problem 4, §4. Assume that the data given there are obtained 

from a correlation table which is a representative sample from a normal 
bivariate universe describing the heights and weights of senior men stu¬ 
dents in colleges and universities of the Unit(jd States. Then a value 
predicted from the regression equation oi y on x will give the mean of the 
‘‘ column ” at x. Similarly, for an assigned ?/, the corresponding x in the 
regression equation oi xony will be the mean of the “ row at ?/. Under 
this assumption, det(3rmine the probability that Doe’s height is outside 
the interval 65.75 — 77.75 inches. What are the chances that Roe will 
be between 100.8 and 122.4 pounds in weight? 

A ns. 1 — Py = .0027, P* = .5 (approximately). 

2. Discuss the reliability of the predictions which you made in Exercise 3, §16. 

OtUline of Solution, Suppose a reliability level of Py = .6 is desired. Mak¬ 
ing the nec(;ssary assumptions, this allows a deviation of t — ±.6745. 
Since Sy = 15.83 we have 

_ ^ 

~ 15.83 

where d — y — y*. That is, y = y* =t ? For x = 37.5, i/ = ? ± ? 
So the probability is .5 that the standard of production will be between 
what limits for a person making x = 37.5 on the mental test? The 
problem is analogous for any other designated value of Py and for other 
assigned values of x. 




3 . Consider the surface represented by (36). Prove that a section of the sur¬ 
face parallel to the yz coordinate plane is a normal curve with its mean 
on the regression line of y on x and with variance Sy^ = <ry*(l — r*). 
Outline of Solution, Write (36) in the form 

(а) / = 

where P = — 2ruv + v*)/2(l — r*), u = x/o-*, v = y/<ryy 2 = 0 = y. 

The trace of the surface in the plane u — ui is determined by substituting 
Ui for w in (a). This substitution yields the result 

( б ) / = Ce-^ 
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where jT = (y — rwi)V2(l — r*), C =* Upon returning to (x, y) 

coordinates, (6) becomes 

(c) f = 

where m = rxi<ry/axi = l/(2/Sfy2), iSfy* = <ry2(l “ ^*)- 

21. Non-Linear Regression. Correlation Ratio. We have seen 
that the regression systems of a normal correlation surface are linear. 
In a correlation table which is a representative sample from a normal 
bivariate universe the means of the arrays would lie approximately 
on straight lines. But in correlation tables which are samples of 
other types of universes, regression might not be linear. Moreover, 
one of the regression curves might be strictly linear and the other 
non-linear. The following numerical example illustrates the latter 
possibility. 



In this example, the regression of y on a: is linear whereas that of x on 
y is non-linear. 

When the means of the columns (or of the rows) do not lie approx¬ 
imately on a straight line, the use of r may be misleading because 
r = 0 indicates absence of linear correlation only and not necessarily 
absence of correlation in general. 

One of the best treatments of this situation is that given in the 
Cams Monograph on Mathematical Statistics, which will be repro¬ 
duced substantially here. 

In introducing a correlation ratio, rjyxf (eta) of y on x, as an appropriate measure 
of correlation to take the place of tlje correlation coefficient in such a situation, 
we may get suggestions as to what is appropriate by solving for r in (10). This 
gives 

(37) = 

where we may recall that Sy^ is the mean square of deviations from the line of 
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regression. Then 


-[-a'" 


This formula could be used appropriately as a definition of r in place of our 
definition in ( 1 ), and its examination may throw further light on the significance 
of r. When Sy = 0 , the formula gives r — dtl and, as we have seen earlier, 
all the dots of the scatter diagram must then fall exactly on the line of regression. 
When Sy = ayy the formula gives r = 0 , and the regression line is in this case of 
no aid in predicting the value of y from assigned values of x. In the formula 
r* = 1 — Sy^/(ry^ it is important to keep in mind that the mean square deviation 
Sy^ is from the line of regression. Next, let Sy'^ be the corresponding mean 
square of deviations from the means of columns. Then Sy^ = Sy^ when the 
regression is strictly linear, but Sy'^ ,4 Sy^ when the regression is non-linear. 
This fact suggests the use of a formula closely related to [1 — Sy^/ay^Y^^ for a 
measure of non-linear regression by replacing Sy by Sy\ We th£n write 

S 

( 38 ) ,,.2 = 1 - 

ay^ 


where riyx is the correlation ratio of y on x, and Sy'^ is the mean square of devia¬ 
tions from the means of the columns whether these means are near to or far 
from the line of regression. 

In general, we may say that the correlation ratio of y on a; is a measure of the 
clustering of dots about the means of columns. 

An analogous discussion for the rows obviously leads to 


giving rjxy^, the square of the correlation ratio of x on y. 

That ‘Hyx’^ ^ 1 and that the equality holds only when all the dots in each 
column are at the mean of the column follows at once from (38). 

That Tjvx® ^ may be shown by recalling the meanings of Sy"^ in (37) and 
of Sy^ in (38). A mcian square of deviations in each colunm is a minimum when 
the deviations are taken from the mean of the array. Hence, the Sy^ in (38) 
must be equal to or less than Sy'^ in (37) for the same data, since the deviations 
in ( 37 ) are measured from the line of regression. Hence, we have shown that 

1 ^ rtyx^ ^ 


Moreover, when the regression oiy onx is linear, — r* found from the sample 
differs from zero by an amount not greater than the fluctuations due to random 
sampling. Hence, 77 ^** — r* becomes a criterion for testing the linearity of 
the regression of y on x. 

For computational purposes, it is desirable to express the correlation ratios 
in a form involving the standard deviatioi s of the means of arrays. For this 
purpose, let yx be the mean of any column of y's and the standard deviation 
of the means of columns when the square (y* — y)* of each deviation is weighted 
with the number /(x) in the column. Then it follows that 


IJyx 


2 ss 


ay 


2 - S/2 




( 89 ) 
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That is, the correlation ratio oi y on x is the ratio of the standard deviation of 
the means of columns to the standard deviation of all ^’s.^ 

To prove (39) we must show that (jy^ — Sy^ = We begin 
by observing that the concentration of the dots in a column about 
their mean may be measured in terms of their standard deviation. 
Let <Ty.x denote the standard deviation of the 2 /^s in the column at x. 
That is, 

(40) Oy.x^ = Tpr - yxYSix, y). 

Now, the concentration of the dots in the entire table about the 
means of the columns may be measured by finding the mean value 
of all such expressions (Ty,x^ for all the columns of the table. But 
since there are more points in some columns than in others, it will be 
desirable to weight the Cy.x^ for each column by multiplying it by 
the number of points or dots in the column. It is this weighted 
mean value of the which we have denoted by That is, 

(41) 

In order to verify (39) we must now show that 

rr 2 = .Q '2 I - 2 

(Ty Oy -r . 

Adapting (14) of §9, Chapter V, to the notation of this chapter, 
we have 

(42) Ncy^ = i:m<ry.x^ + Ym iSx - yY. 

X X 

This follows from the fact that N is composed of the several sub¬ 
distributions/(a:) in the columns, and is the standard deviation 
of a column about its mean yx- It is obvious that 

^2/(*)(y» - yY 

gives the variance means of the columns. The above 

expression (42) then becomes 

NcY = NSy'^ + NffiY, 

which reduces to vY — Sy'^ = afYt hence we obtain (39). 

22. Computation of i]^. It should be instructive to compute 
t)yx^ for Table 36, by both relations (38) and (39). 

^ Rietz, Cants Monograph on Mathematical SUUistics, p. 89 el seq. 
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For (38) we have the following: 

"" 7 )^ 2 ( 2 / - y^yfi^, y)- 

f\P^) y 


<tyx 

m 

106.12» 

7 

191.83 

14 

246.48 

32 

283.63 

49 

257.65 

55 

294.51 

54 

222.53 

35 

71.43 

14 


8y'^ = 246.45 

= (17.41)2 = 303.11 


9 1 

" 303.11 

= .1869. 

J’or (39) we have the following: 

riyx^ = ^ ^2(5* - 5)*/(a:), 

(T y X 

y = 87.31, 


5* 

m 

67.86 

7 

72.14 

14 

81.87 

32 

84.80 

49 

85.73 

55 

90.92 

54 

95.57 

35 

105.00 

14 


(Tj/ = 56.66 (see Exercise 3, p. 97). 
2 _ 56.66 
” 303.11 
= .1869. 


1 See Table 27 and Table 16. 
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In verifying (39) for this example we have — 303.11 — 

246.46 = 66.66 and (Ty/ = 66.66. 

The above illustrations are useful in giving an understanding of 
the meaning of However, for computational purposes, another 
formula may be derived which involves less labor than either (38) 
or (39). In fact, the computation of a correlation ratio may be very 
conveniently performed by an easy extension of a correlation table. 
The derivation of the appropriate formula will now be given. 

The standard deviation (cr^^) of the means of the columns may be 
expressed in the (w, v) units by the relation 

where <TvJ^ = ■^2Z/(^)^u^ — 

which is the definition of the standard deviation of the variable Vu- 
This is apparent if we observe that the mean for the whole table in 
the ^-direction (i)) is the mean of the quantities Vu for the several 
columns.^ 

Since 




we have 

ff- * - ^ y; 

Recalling that = k^Cv^, we have 


~ kW 

that is, 


(43) 



An analogous discussion for the rows of x's leads to 



giving the square of the correlation ratio of x on y, 

N u ^ u JW V ^ u V -Ar u,i» 
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Example, Find for Table 35. Solution: Referring to this table and 
using (43) we obtain the following results: 


V2 

49 


9 

49 1 

900 

121 

169 

Sum 

vvm 

3.78 

.75 

.47 

2.45 

37.50 

20.17 

28.17 

93.29 


= .2916, - 1.8084, N = 100. 

flvz^ = .3546. 

It may be well to mention that the value of rj is not independent 
of the classification of the data. As the class intervals become 
narrower, rf approaches unity. This may be understood from (38). 
If the grouping were so fine that only one item appeared in each 
column, then it would constitute the mean of that column. In this 
case Sy would be zero and rj would therefore be unity. On the other 
hand, a very coarse grouping tends to make the value of rj approach r. 
“ Student has given a formula for The Correction to he Made in the 
Correlation Ratio for Grouping in Biometrika, vol. IX, pp. 316-320. 

23. Further Discussion. Test for Linearity of Regression. Let 
us consider the totality of mean points (x, yx) of the columns and 
think of a curve connecting them. Of course, for a table of observed 
data, it is possible to draw many such curves. In order to show 
clearly why a comparison of and is the basis of a test for linearity 
of regression, it will be necessary to consider a theoretical table in 
which there is only one such curve. When we speak of the regression 
curve we are thinking, not of the given table in which the dimensions 
of the cells are h and A;, but of an ideal table in which there is an 
infinity of cells of zero dimensions. To put it another way, consider 
a sample of N pairs of values (xt, y») from which a correlation table 
is made with cells whose dimensions are h and k. If parallelepipeds 
are erected on the cells with heights proportional to the frequencies, 
the result is a solid histogram bounded by a broken surface. As 
A 0, A —»0, and A —> oo, this histogram will approach some solid, 
bounded by a smooth surface. An example of such a surface is the 
normal correlation surface. In such an ideal table, it is possible to 
have but one curve connecting the means of the columns. This 
curve is sometimes styled the trite regression curve of y on x. In an 
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analogous way for the means of the rows there would be a true 
regression curve of x on y. It is one of these curves that we have in 
mind when we speak of ‘‘ the regression curve ” or the regression.” 
For a normal bivariate universe (represented by a normal correlation 
surface), regression is linear. But for other types of bivariate 
universes (which might be represented by skew surfaces), it is 
conceivable that regression might be parabolic or exponential or 
some other type of curve. In such types, regression is said to be 

non-linear. The curve which is chosen to 
approximate the true regression curve must 
not be confused with the true regression 
curve. The latter notion relates to the 
ideal universe from which the data at hand 
are a sample. It is defined as the locus 
of the mean points of the columns of the 
theoretical table. When we fit a curve to 
the means of the columns of an observed 
table, this regression curve is merely an 
. approximation to the ideal set up in the definition. Similar state¬ 
ments may be made about the regression of x on y. 

We will now recapitulate the expressions used in the comparative 
analysis of and riyx^ for an observed table. 



(46) 


(46) 




Vyx — -I- 2 


Hiy - oVxWix, y) 


- 2 
Vx 


o 2 , 


. /(a;) 


y) 




= 1 - 


SY 


Recall that ay.x^ is defined as the variance in a column and therefore 
as the square of the standard error about the regression curve, what- 
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ever it may be, which goes through the means of the columns. Sy^ 
is an average of the values, and is defined in terms of SJ. 
Correspondingly, Sy.x^ is the square of the standard error in a column 
about the line which best fits the means of the columns. Sy^ is an 
average of the Sy,x^ values, and is defined in terms of Sy. If 
regression is linear, the means of the columns will fall on the best¬ 
fitting line and becomes the same as Then Sy^ = Sy^, 

and hence riyx^ = 

It is interesting to observe that Cy,x^ is the second moment about 
the mean, for an array of ^/^s, i.e., for a column. In the notation 
of moments it could be denoted by ^ 2 : 1 /.®. In this notation, Sy,x^ 
could be denoted by v^-.y.x, being the second moment in an array 
of 2 /\s about a point other than its mean. Since 112 < V 2 , it follows 
that ffy.x^ < Sy.x^- Therefore Sy^ < Sy^ and rjyx^ > If each y 
value of a column is at the mean of that column then it is 
obvious that will be zero. In this case, Sy = 0, and f\yx^ = 1. 
On the other hand, for any column, the contribution of (Ty,x^f{z) to 
Sy^ cannot exceed its contribution to o-y^. Taking the weighted 
mean of the respective contributions over all the columns, we have 
Sy^ < (Ty^ and hence 

< 1 . 


Writing (38) in the form 

SJ = Cy{l - rjyx^yi^ 

we see that Sy is a measure of dispersion about the regression curve 
(which is the locus of the means) corresponding to Sy = cry(l — 
which is the standard error about the best ” line. If == 1, then 
y is related to x by a linear function. If rjyx^ = 1, it follows that y 
is a single-valued function of x. On the other hand, if = 0, it does 
not necessarily follow that there Is no relation ^ between y and x. If 
Vvx^ = 0 then = 0, but if = 0 it does not necessarily follow that 
Vvx^ = 0 . 

In the ideal table, regression of y on a: is linear if and only if 
Tjyx^ — = 0. But in the case of an observed table, allowance must 

be made for sampling fluctuations. A corresponding analysis could 
be made for and rixy^, and rjxy^ — computed from the sample should 

1 See H. L. Rietz, ‘‘ On Functional Relations for which the Coefficient of Corre¬ 
lation is Zero.” Journal American Statistical Association^ vol. 16, 1919, pp. 472- 

76. 
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differ from zero by an amount not greater than the fluctuations due 
to chance, if regression of a; on ?/ is linear. The question naturally 
arises, what discrepancy between the computed values of 97 ^ and 
may be tolerated before we conclude that regression is non-linear? 
This problem has been investigated, and Blakeman ^ has proposed a 
testing formula. If certain assumptions are made, a simple though 
approximate test may be deduced from Blakeman's formula. Ac¬ 
cording to this approximate test if 

(47) - r 2 ) < 11.4 

then linear regression may be assumed. Since there are two there 
are two tests. It is possible for one of the regression curves to be 
linear and the other not. 

Evaluating (47) for Table 35 we obtain 100 [.3546 — (.58)^] = 1.82, 
so the regression oi y on x may be assumed to be linear. 

R. A. Fisher has shown that the Blakeman test is not very reliable. 
One can easily construct an example for which regression is obviously 
non-linear yet which satisfies the criterion (47). Consider the fol¬ 
lowing table: 


m 






Bi 

1 

2 


4 

5 

3 

0 

0 


0 

0 

2 

0 

1 

0 

1 

0 

1 

1 

0 

0 

0 

1 


Here, iV = 5, ^xy = 27, x = 3, ^ = 9/5. From (3), therefore, 
r = 0 . From (40) and (41), SJ = 0 and rjyx = 1. Applying (47), 
Blakeman’s test 3 delds a verdict of linear regression of y on x. It 
appears that Blakeman’s xjriterion is of doubtful utility. A more 
efl^cient method of testing linearity of regression is given in Part II. 

Exercises 

1. Using (43) and (44) find and iy*y* for the table referred to in Exercise 4, 
page 221. Apply the test (47) and state your opinion about the linearity 
of regressions. 

^ See Handbook of Maih^maiical Statistics, Bietz and others, p. 131. 
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2. In the following table, x *= Interest Rates, 4-6 months Commercial Paper; 
y = Total Bills Discounted by Federal Reserve Banks (1923-1932). Find 
r and lyyx®. Form an opinion about linearity of regression of y on x. (Data 
from Elements of Statistics, Davis and Nelson, page 288.) 


Class 

Marks 

y 




1 

1 

1 

1 

1 

1 

1 

7 





B 


1 

6 

6 

6 

6 




i 

B 

2 

3 

4 



5 





1 

3 

1 

2 



4 




2 

B 

9 

4 

1 



3 


1 

2 

B 


9 

4 




2 


1 


■ 

11 

5 

B 

■ 



1 

4 


2 

3 

3 

1 

1 

B 



0 

2 

3 

3 

6 

3 


1 

B 



Class 

Marks 

X 

0 

B 

2 

3 

4 

5 

6 

7 

8 

9 


3. In §44, Statistical Methods for Research Workers, R. A. Fisher writes: “ The 

sum of the squares of the deviations of all the values of y from their gen¬ 
eral mean may be broken up into two parts, one representing the sum of 
the squares of the deviations of the means of the arrays from the general 
mean, each multiplied by the number in the array, while the second is the 
sum of the squares of the deviations of each observation from the mean of 
the array in which it occurs.’' [Compare with our (14o) of Chapter V.] 
Prove Fisher’s statement. Hint. In symbols, you are to prove that 

V = vi + t;2 

nrVioiVk 

V = Z(v-m(=o,y) 

Xf V 

fi = S (5x - 5)*/(*) 

X 

= Z (j/ - yx)’/(*. y)- 

XtV 

4. Prove that ijv,* is the ratio between vi and V as defined in Exercise 3. 
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5. The mortality experience during the early years of an insurance company 
presents an interesting study in correlation. The following table shows 
for male lives the correlation between the ages (x) of the insured at issue 
of policy and his age (y) at death. Data of the Midland Life Insurance 
Company,^ 1906-1924. 



Find r, the two and the equations of the lines of regression. 

24. Correlation from Ranks. Before defining rank we will find the 
variance of the difference, z, between corresponding values of two 
variables. Let x and y denote corresponding values of two series each 
consisting of N variates. Form a third series z where Zi = Xi — yi. 
Then the mean of z is given hy z = x — y and the standard devia¬ 
tion of z is, by definition, 

1 From a paper On Certain Applications of Mathematical Statistics to Actuarial 
Data in The Record, American Institute of Actuaries, vol. XIII, Part II, 
No. 28, November, 1924. 
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Beplacing z by its equal x — y, we have 

-‘2xy + y^) - f* - + 2x^ 

= {I - 2{^ - xy]^ + Zy* - y*} • 

Whence 


(48) 


= (Jx^ — 2rax(Ty + (Ty^. 


If the variables x and y are uncorrelated, we have as a special case 
Solving (48) for r, we obtain 


(49) 


r = 


— 0**2 
2<r xO” y 


This is another expression for the correlation coefficient and involves 
standard deviations only. In particular, it may be used to advantage 
when X and y denote ranks, where by rank we mean order of magni¬ 
tude or importance. That is, rank refers to the position of a variate 
in an arrangement. 

If X and y denote the ranks of the same item with respect to two 
characteristics, and no ranks are omitted, and there are no duplica¬ 
tions of ranks, then both x and y refer to the integers from 1 to N. 

Therefore, x — y, and ^ — 1) = See Theorem VI, 

Chapter V. Moreover, 


= - vY - (s - ?)* 


= ^ ]C(^ ~ y)*> s>°ce X - ^ = 0. 


Let R denote the correlation coefficient when x and y refer to ranks 
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rather than variates. 

B = 

which simplifies into 
(50) 


Then (49) becomes 

oI](a^ - yY 


B=l- 


^■(iV* - 1) 

This is known as Spearman^s formula for rank correlation. 

If two or more variates are tied it is customary to divide the 
corresponding rank numbers among the variates concerned, using 
fractions if necessary. 

^ Example. Suppose we have the following scores made in two tests, arranged 
in the order of their rank. Find the correlation between ranks. 


Indi- 

1st Subject 

2nd Subject 

m 

(l - y)t 

viduod 

Score 

Rank — x 




A 

92 

1 

85 

2 


1 

B 

86 

2 

76 

4 


4 

C 

84 

3 

93 

1 

2 i 

1 

4 

D 

78 

4 

68 

6 

-2 

4 

E 

71 

5 

67 

7 

-2 

4 

F 

69 

6 

83 

3 

3 

9 

G 

66 

7 

54 

9 

-2 

4 

H 

58 

8 

70 

5 

3 

9 

I 

53 

9 

43 

10 

-1 

1 

J 

45 

10 

59 

8 

2 

4 

N ^10 





Total 

44 


WefindiB*!--^ -.733. 
10(99) 
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Exercises 

1. Suppose z = X + y. How would this change formulas (48) and (49)? 

2 . Twelve salesmen are ranked in order of merit for efficiency by their manager. 

They are also ranked in accordance with their length of service. What 
indication is there of a relation between length of service and efficiency? 


(Garrett.) 



Order of 


Years of 

Order of Merit 

Merit 

Salesmen 

Service 

(Service) 

(Effic.) 

A 

5 

7.5 

6 

B 

2 

11.5 

12 

C 

10 

2 

1 

D 

8 

4 

9 

E 

6 

6 

8 

F 

4 

9 

5 

G 

12 

1 

2 

H 

2 

11.5 

10 

I 

7 

5 

3 

J 

5 

7.5 

7 

K 

9 

3 

4 

L 

3 

10 

11 


The fractions in the third column denote ties in rank. Thus, A and J each 
served 5 years and each is ranked 7.6. The next individual is ranked 9. 
Ans. R = .80. 

3. Find R for the following data: 



Rank 

Score 

Rank 

Score 

A 

1 

92 

2 

88 

B 

2 

89 

4 

85 

C 

3 

87 

1 

93 

D 

4 

86 

6 

79 

E 

5 

8;i 

7 

70 

F 

6 

77 

3 

87 

G 

7 

71 

9 

52 

H 

8 

62 

5 

84 

I 

9 

53 

10 

41 

J 

10 

40 

8 

64 


Ans. R = .733. 

26. Interpretation. Common Elements. Although statistical 
theory gives a description of the indicated relationship between two 
related variables, the interpretation of the results “ abound in pitfalls 
easily overlooked by the unwary, while they are cantering gaily 
along upon their arithmetic.” 
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The methodological side has been developed until we can find correlation coeffi¬ 
cients by simply turning a crank, but the explanation of the meaning of the result 
after we find it, needs a brain.... No amount of mathematical training and 
ability can take the place of the judgment and common sense that comes from 
a knowledge of the field in which the problem lies.^ 

In the interpretation of r one should avoid imputing any causal 
relationship between the variables. In this connection the following 
pungent remarks of Professor E. B. Wilson ^ may be appropriately 
quoted: 

Correlation is a mutual affair between two numerical variables; the correlation 
coefficient r is symmetrical with respect to them. Strictly, y is not correlated 
with X or X with y, but x and y are correlated. Theory is very important in 
indicating what facts should be looked for as significant; facts arc significant 
or important largely as they indicate theory, but neither compels the other, as 
the histories of theorizing and of fact finding amply demonstrate ;.. Further, 
the value of the correlation coefficient depends on the group for which it is deter¬ 
mined or on the universe of which that group is a fair sample. The correlation 
coefficient r of height and weight for a group containing humans from infancy to 
adult life would be different from, and in fact greater than, the coefficient for 
college students or for the members of a football squad; there is no such thing 
as the correlation coefficient per se. 

If the student has mastered the underlying mathematical theory 
he should be able to understand and profit by the interpretations 
given by the writers in his particular field of interest. As a final 
aid in forming a conception of its meaning, we state a theorem which 
gives to r a meaning in pure chance. If x and y are affected by s 
equally likely causes of which t are common to both, then r = t/s. 

Theorem VI. An um conUiining white and black balls is so main¬ 
tained that in drawing a ball the 'probability of getting a white ball is a 
constant p and that of getting a black ball is q {= 1 p). The first 

drawing of a pair of dramngs is to consist of s balls taken one at a time 
from the urn. The second drawing is to consist of s balls of which t are 
taken at random from the s first drawn, and s — ^ are drawn one at a time 
from the urn. Then the correlation coefficient between the numbers of 
white balls in the two drawings is t/s. 

As an illustration of the theorem we will take s = 5, < = 3, p - J. 
Let X be the number of white balls in the first drawing and y the 

^Crathome in Journal of the American Statistical Association, vol. 26, 
Supplement, March, 1931, p. 27. 

* Correlation and Association, Journal of the American Statistical Association, 
vol. 26 (1931), pp. 250-256. 
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number of white balls in the second drawing. Then Table 37, 
constructed by the theory of probability,^ exhibits the a priori fre¬ 
quencies when we use as small numbers as possible for frequencies 
subject to the condition that each frequency is to be an integer. 

Table 37 — A Priori Frequencies 


a; 

0 

1 

2 

3 

4 

5 

fiy) 

5 

0 

0 

0 

9 

6 

1 

16 

4 

0 

0 

81 

108 

45 

6 

240 

3 

0 

243 

648 

432 

108 

9 

1440 

2 

243 

1620 

1728 

648 

81 

0 

4320 

1 

1458 

3159 

1620 

243 

0 

0 

6480 

0 

2187 

1458 

243 

0 

0 

0 

3888’ 

/(x) 

3888 

6480 

4320 

1440 

240 

16 

16,384 


According to the theorem the correlation coefficient should be f. 
It is left as an exercise for the student to show, by computing r from 
the table, that this is actually the case. 




4. 


Review Questions and Problems 

Define the following terms: katistics, variate, discrete, class interval, class 
mark, a:-array of i/’s, range, regression line, sample, universe, coefficient of 
Variation, variance. 

Name and define five averages. Discuss their advantages and limitations. 

What does a ratio chart show that a chart with a uniform scale does not? If 
you wished to plot data so as to secure the effect of a ratio chart, but had 
no ratio paper available, how would you accomplish the desired result? 

Prove the following: 

(o) The algebraic sum of the deviations of the variates from their mean 
is zero. 

•^(6) The second moment about an arbitrary point equals the second mo¬ 
ment about the mean increased by the square of the distance between 
the arbitrary point and the mean. 


^ Explained in Part II. 
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5. 


6 . 


7. 


8 . 


9. 

10 . 


11 . 


12 . 

13. 


14. 


16. 


16. 


ia) Define and explain how to compute the following; 

Qif Qit Qt MD, Sf <r. 

(5) In the case of a normal distribution give the value of each of the first 
four constants in (a) in terms of f or a. 

(a) Give the equation of the normal curve in both arbitrary coordinates and 
standard units. State the relation between abscissas and between ordi¬ 
nates in the two systems. 

( 6 ) State the properties of the normal curve. 

Show how to fit a straight line y = mx khy the method of moments by 
deriving the expressions for m and k. 

Show how to fit an exponential function by the method explained in the 
text. 

Show how to fit a parabola by the method of moments. 

(a) Give two of the formulas for r. Discuss the use or uses of correlation in 
any problem that occurs to you. 

( 6 ) Show that the slope of the line in problem 7 may be written r<ry/«r». 

Prove that \r\ < 1 . 

(b) Define the correlation ratio. Discuss its use. 

Discuss rank correlation. 

Derive the following relations: 

35 = cu + 3?o 

^2 = >^2 — vi * 
mix = 
ax = C(r«. 

The following is a reduced distribution of the breakfast checks at a cafeteria. 
Using the indirect method find x and a*. 


X 

/ 

8-12 

4 

13-17 

8 

18-22 

24 

23-27 

21 

28-32 

15 

33-37 

14 

38-42 

7 

43-47 

4 

48-52 

2 

63-57 

1 


Ans» S =» 27.2ff| a = 9.4jf. 

Derive the relations which give the third and fourth moments about the 
mean in terms of moments about an arbitrary origin. Define and ai. 
What information do they give? 

Compute the value of as and of 04 for the distribution in Exercise 14. 
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17. The following is a distribution of the heights of students where x denotes 
heights in inches and / is the number of students of the corresponding 
heights. Find axt as, and 04 - 


X 

/ 

60.5 

1 

62.0 

3 

63.5 

14 

65.0 

32 

66.5 

61 

68.0 

80 

69.5 

71 

71.0 

35 

72.5 

24 

74.0 

2 

75.5 

1 


18. For N values of a variable v it is known that = 0 and = N: What 

are the origin and unit of v? 

19. Find in two ways the value of P for which the function 


E/(* - Py 



has the smallest value. 

(Walker) An algebra test was given to 400 high school children, of whom 
150 were boys and 250 were girls. The results were as follows: 


ni = 150 712 = 250 

xi = 72.5 X2 = 73.6 

Cl = 7.0 C2 = 6.4 

Find the mean and standard deviation of the combined groups. 

21 . For a normal distribution of 1500 students’ grades, x = 75, cx = 10. What 

values of x will include the middle 500 grades? How many grades were 
below 60; above 90? 

22 . Suppose a distribution of 1000 breakfast checks from the cafeteria mentioned 

in problem 14 showed the following results: x = 27cx = 9ff, as = 0, 
a 4 = 3. On the basis of these results what is the expected frequency in 
the 23-27class interval? 

23 . Given the following data as to the heights (y) and weights (x) of college men: 

'Ey = 6,800, Ey ^ = 403,025, E^y = 1,022,250 

Ex = 15,000, Ex^ = 2,272,500, N = 100. 

Find Xf 5, cxj cyt r. 

24 . Derive the expression for the standard error of estimate, 

Sy = Cy{l - r*)W>. 

26. Discuss the use of Sy in predictions. 
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26 . Compute the median, quartiles, and quartile deviation for the following dis¬ 
tribution where x = bushels per acre and / = corresponding frequency. 


X 

/ 

1 

3 

3 

26 

5 

78 

7 

107 

9 

113 

11 

65 

13 

40 

15 

22 

17 

45 

19 

41 

21 

21 

23 

23 


27. (a) Find r for the following table using (w, v) coordinates. 


X 

17 

19 

21 

23 

M 

18 


3 

2 


6 

15 

2 

4 

3 


10 

12 

•2 

1 

1 

H 

4 

/(*) 

4 

8 

6 

2 

20 


(b) For the above data, find x, o-y, and the^ equations of the regres¬ 
sion lines. 

28 . For Table 38, (a) find the correlation coefficient, (6) find the equations of the 

lines of regression, (c) locate the coordinate axes through the arithmetic 
mean of the table and plot the lines obtained in (6). 

29 . Fit an exponential function of the type y = to the following data: 


X 

0 

2 

4 

y 

2 

10 

100 


First find the equation in the forms 
(d) Y ^ at b 
(b) Y = mz + k 
and then determine A and B, 
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Table 38 — Correlation Table for Monthly Rainfall at Iowa City and 
Des Moines, 1890-1925 
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30 . How does the scatter diagram assist one in deciding whether the regression is 

linear or non-linear? Give the formulas for the correlation coefficient 
and for the correlation ratio of y on rr, explaining the meaning of the letters 
used. How would you use these indices of correlation to decide whether 
the regression of 2 / on x is linear or non-linear? 

31 . (o) In a normal distribution in which x = 0 and o-* = 4, what proportion 

of the data lie where x > 12? 

(6) If 100 of the data lie between x = — 6 and x = —8, how many of the data 
are there in the whole distribution? 

32 . (a) When the variates are ungrouped what is perhaps the best formula 

for o-z? Ans, 

[n'Ex^ - (E*)*]'" 

- N - 

(b) What does this expression become in terms of N when x refers to the 
integers from 1 to iV? 

33 . (a) Expand (a 4- 6 4- c + d)*. 

(6) The expansion of (xi 4* X 2 4- • • • 4- Xn)^ consists of the sum of the 
squares of the x’s plus the sum of their products taken two at a time. 
Express this expansion in summation notation. 

34 . (o) Show that the formula for MD may be written 

MD=-J[2E fi- Ufail 

XiKsb Xi<2 

Hint. For x< < x, - x| = - II/*(x< - 5) = - x<) = 

For Xi > H/t|x* - x| = “H/*(x - Xi). 

Since 2E is the centroid (§14, Chapter III), — for x< > x equals 

^f%{x — Xi) for Xi < X. 

(h) Using this formula evaluate MD for one of the distributions in the text. 
36 . Given N pairs of variates: (xn, X 21 ); (xi 2 , X 22 ); (xu, X 28 ); • • S (xm, X 2 n). 
Show that: 

(a) the mean x of all the variates is 

1 ” 

® + ir2t), 

(&) the variance <r^ taken about the S in (a) is 

IZ(XH - S)‘ + t(xu - m 
2N 1 1 

Note. The quantity 

1 ^ 

^ - 2)(®« - *) 

where 2 and a* are defined as in (a) and (6) is called the intra-daes corre¬ 
lation coefficient. For its use see Statistical Methods far Research Workers, 
Fisher (§38), Oliver and Boyd, London. 



Review Questions and Problems 


233 


1 ^ 

36. Let Sr = — 'E.xr. Provo that S, = N(.N + l)/2. 

Si = N(N + l)(2Ar + l)/6, Ss = Si\ 

37. Sketch the graph of 2 / = — 00 < x < 00 , when (a) both A and B are 

positive, (6) A is positive and B negative, (c) A is negative and B posi¬ 
tive, (d) both A and B are negative. 

38. A large number of rectangles are drawn all having the same perimeter but 
different bases (a;) and altitudes {y). Which of the following is the cor¬ 
rect answer? The coefficient of correlation between x and y is (a) nega¬ 
tive and numerically large, (6) positive and numerically small, (c) positive 
and numerically large, (d) approximately zero. 

For N correlated values of x and y the regression equation of y on x is found 
to be ?/ = 1 + ir. If X = 0, r = 0.5, and <r* = 1, determine y and Sy, 

40. Let NSy"^ denote the sum of squares of deviations from the line of least 
squares (Case I). 

(a) Show that NSy^ = “■ 'ffi^xy — k^y. 

Hint. NSy^ = ^(y — mx — k)^ 

= ^y(y mx — k) — m^xiy — mx — k) 

-kZiy — rnx — k). 

The last two expressions vanish. Why? 

(h) If m and k are replaced by their determinant values from (5), p. 143, 
show that 


The third order determinant is D bordered by ^y, ^xy. 

(c) If X and y are replaced by x' and ?/', denoting deviations from their 
respective moans, find the values of the resulting determinants in (6). 

(d) From the results in (c) show that Sy^ = o’i/^(l — ^^)- 

41. Discuss the properties of the normal correlation surface and their use in 

passing judgment on the reliability of predictions based upon the regres¬ 
sion line of y on x. 

42. {Far calculus students) In fitting points in a plane by a line so that the 

sum of squares of ^>erpendicular deviations shall be a minimum, a second 
line may be found for which the sum of squares of perpendicular devia¬ 
tions is a maximum. If J^d^ is the sum of squares of deviations from the 
first line and is the sum of squares of deviations from the second line, 
show that Z^dVS^^ * (1 +^)/(l — [Reference: Bulletin Ameri¬ 
can Mathematical Society, vol. 47 (1941), p. 710.1 
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Tables 

I. Ordinates and Areas op the Normal Curve. 

II. Common Logarithms of Numbers to Five Decimal Places. 




Table I. Ordinates and Areas op the Normal Curve, 

v2x 


t 

0(t) 


t 

</.(() 


t 

4>(t) 


.00 

.39894 


.45 

HBI 

.17364 


.26609 

.31594 

.01 

.39892 


.46 

.35889 

.17724- 

.91 

.26:369 

.31859 

.02 

.39886 


.47 

.35723 

.18082 

.92 

.26129 

.32121 

.03 

.39876 

HiiltijM 

.48 

.35553 

.18439 

.93 

.25888 

.32381 

.04 

.39862 


.49 

.35381 

.18793 

.94 

.25647 

.32639 

.05 

.39844 



.35207 

.19146 

.95 

.25406 

.32894 

.06 

.39822 


.51 

.35029 

.19497 

.96 

.25164 

.:33147 


.39797 


.52 

.:148^19 

.19847 

.97 

.24923 

.3:3398 

.08 

.39767 


.53 

.34667 

.20194 

.98 

.24681 

.:«646 

.09 

.39733 

.03586 

.54 

.34482 


.99 

.24439 

.33891 

BRh 

.39695 


.55 

.34294 

.20884 

1.00 

.24197 

.34134V 

.11 

.39654 

.04:180 

.56 

.34105 

mamml 

1.01 

.23955 

.34375 

.12 

.mm 

.04776 

.57 

.3:1912 

.21566 

1.02 

. 2:3713 

.34614 

.13 

.39559 


.58 

.33718 

.21904 

1.03 

.23471 

.34850 

.14 



.59 

.:i:i52i 

.22240 

1.04 

.23230 

.35083 

.15 

.39448 

.05962 

.60 

.33322 

.22575 

1.05 

.22988 

.35314 

.16 

.39387 

.06:156 

.61 

..‘13121 

mmim 

nnn 

.22747 

.35543 

.17 

.39322 

.06749 

.62 

.32918 

. 2:1237 

IlltiVl 

.22506 

.35769 

.18 

.39253 

.07142 

.63 

.32713 

.23565 

1.08 

.22265 

.35993 . 

.19 

.39181 


.64 


.2:i891 

1.09 

.22025 

.36214’' 

.20 

,39104 

.07926 

.65 


.24215 

1.10 

.21785 

.36433 

.21 



.66 


.24537 

mma 

.21546 

.36650 

.22 


.08706 ' 

.67 

.31874 

.24857 

1.12 

.21307 

.36864’i< 

.23 

.38853 

mmm 

.OSv' 

.31659^ 

, .25175' 

' 1.13 

.21069 

.37076 

.24 

.38762 

.09483 

.69 

.31443 

.25490 

1.14 

.20831 

.37286 

.25 

.38667 

BRI 


.31225 

.25804 

1.15 

f .20594 

.37493 , 

.26 

.38568 

.10257 

.71 

.31006 

.26115 

1.16-^ 

.20357 

.37698'' 

.27 

.38466 

.10642 

.72 

.30785 

.26424 

1.17 

.20121 

.37900 

.28 

.38361 

.11026 

.73 

.:10563 

.26730 

1.18 

.19886 

.38100 

.29 

.38251 

.11409 

.74 

.30:i39 

.27035 

1.19 

.19652 

.38298 

.30 

.38139 

. 11791 

.75 

.30114 

. 27:537 

1.20 

.19419 

.38493 > 

.31 

.38023 

. 12172 

.76 

.29887 

.276:17 

1.21 

.19186 

.38686 

.32 

.37903 

.12552 

.77 

.29659 

.27935 

1.22 

.18954 

.38877 

.33 

.37780’- 

.. 129:10 

.78 

.29431 

.282:10 

1.23 

.18724 

.39065 

.34 

.37654 

13307 

.79 

.29200 

.28524 

1.24 

.18494 

.39251 

.35 

.37524 

.13683 

.80 

.28969 

.28814 

1.25 

.18265 

.39435 

.36 

.37391 

.14058 

.81 

.28737 

.29lo:iw 

' 1.26 

.18037 

.39617 

.37 

.37255 

.14431 

.82 

.28504 

.29389 

1.27 

.17810 

.39796 

.38 

.37115 

.14803 

.83 

.28*269 

.29673 

1.28 

.17585 

.39973 

.39 

.36973 

.15173 

.84 

.28034 

.29955 

1.29 

.17360 

.40147 

.40 

.36827 

.15542 

.85 

.27798 

.30234 

1.30 

.17137 

.40320 

.41 

.36678 

.15910 

.86 

.27562 

,30511 

1.31 

.16915 

.40490 

.42 

.36526 

.16276 

.87 

.27324 

.30785 

1.32 

.16694 

.40658 

.43 

.36371 

.16640 

.88 

.27086 

.31057 

i.:i3 

.16474 

.40824 

.44 

.36213 

.17003 

.89 

.26848 

.31327 

1.34 

.16256 

.40988 
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1 




t 



t 

4>(t) 


t 

</>(«) 


1.35 

.16038 

.41149 

1.80 

MM 

.46407 

2.25 

.03174 

.48778 

1.36 

.15822 

.41309 

1.81 

Kl/riill 

.46485 

2.26 

.03103 

.48809 

1.37 

.15608 

.41466 

1.82 

Hlf/ncl 

.46562 

2.27 

.03034 

.48840 

1.38 

.15395 

.41621 

1.83 

.07477 

.46638 

2.28 

.02965 

.48870 

1.39 

.15183 

.41774 

1.84 

.07341 

.46712 

2.29 

.02898 

.48899 


.14973 

.41924 

1.85 

.07206 

.46784 

2.30 

.02833 

.48928 

1.41 

.14764 

.42073 

1.86 


.46856 

2.31 

.02768 

.48956 

1.42 

.14556 

.42220- 

1.87 


.46926 

2..32 

.02705 

.48983 

1.43 


.42364 

1.88 

iM 

.46995 

2.33 

.02643 

.49010 

1.44y 

.14146 

.42507^ 

1.89 

.06687 

.47062 

2.34 

.02582 

.49036 

1.45 

.13943 

.42647 

1.90 

.06562 

.47128 

2.35 

.02522 

.49061 

^ 1.46 

.13742 

.42786 

1.91 


.47193 

2.36 

.02463 

.49086 

1.47 

.13542 

.42922 

1.92 

llllWlIil 

.47257 

2.37 

.02406 

.49111 

1.48 

mmSM 

.43056 

1.93 

.06195 

.47320 

2.38 

.02349 

.49134 

1.49 

.13147 

.43189 

1.94«/ 

.06077 

.47381^ 

2.39 

.02294 

.49158 


.12952 

.43319 > 

'1.95 

.05959 

.47441 

2.40 

.02239 

.49180 

1.51 

.12758 

.43448 

1.96 

.05844 

.47500 

2.41 

.02186 

.49202 

1.52 

.12566 

.43574 

1.97 

.05730 

.47558 

2.42 

.02134 

.49224 

1.53 

.12376 

.43699 

1.98 

.05618 

.47615 

2.43 

.02083 

.49245 

1.54 

.12188 

.43822 



.47670 

2.44 

.02033 

.49266 



.43943 

2.00 


.47725 

2.45 

.01984 

.49286 

1.56 

.11816 


2.01 

.02592 

.47778 

2.46 

.01936 

.49305 

1.57 

.11632 

.44179 

2.02 


.47831 

2.47 

.01889 

.49324 

1.58 


.44295 

2.03 

.05082 

.47882 

2.48 

.01842 

.49343 

1.59 

.11270 

.44408 

2.04 

.04980 

.47932 

2.49 

.01797 

.49361 


.11092 


2.05 


.47982 

2.50 

.01753 

.49379 

1.61 

.10915 

IcinSTii 

2.06 

Hit iim 

.48030 

2.51 

.01709 

.49396 

1.62 

.10741 

.44738 

2.07 

Kil 

.48077 

2.52 

.01667 

.49413 

1.63 

.10567 

.44845 

2.08 

Hi? 

.48124 

2..53 

.01625 

.49430 

1.64 

.10396 

.44950 

2.09 

.04491 

.48169 

2.54 

.01585 

.49446 

1.65 

.10226 

.45053 

2.10 

.04398 

.48214 

2..55 

.01545 

.49461 

1.66 

.10059 

.45154 

2.11 

HilsiiH 

.48257 

2..56 

.01506 

.49477 

1.67 

.09893 

.45254 

2.12 

Hif^rl 

.48300 

2.57 

.01468 

.49492 

1.68 

.09728 

.45352 

2.13 


.4^341 

2.58 

.01431 

.49506 

1.69 

.09566 

.45449 

2,14 

.04041 

.48382 

2.59 

.01394 

.49520 


.09405 

.45543 

2.15 

.03955 

.48422 

2.60 

.01358 

.49534 

1.71 

.09246 

.45637 

2.16 


.48461 

2.61 

.01323 

.49547 

1.72 

.09089 

.45728 

2.17 

.03788 

.48500 

2.62 

.01289 

.49560 

1.73 

.08933 

.45818 

2.18 

.a3706 

.48537 

2.63 

.01256 

.49573 

1.74 

.08780 

.45907 

2.19 

.03626 

.48574 

2.64 

.01223 

.49585 

1.75 

.08628 

.45994 

2.20 

.03547 

.48610 

2.65 

.01191 

.49598 

1.76 

.08478 

.46080 

2 .21'^ 

.03470 

.48645V 

2.66 

.01160 

.49609 

1.77 

.08329 

.46164 

2.22 

.03394 

.48679 

2.67 

.01130 

.49621 

1.78 

.08183 

.46246 

2.23 

.03319 

.48713 

2.68 

.01100 

.49632 

1.79 

.08038 

:46327 

2.24 

.03246 

.48745 

2.69 

.01071 

.49643 
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Table I. Ordinates and Areas op the Normal Curve, 0(0 = 

V2ir 


t 

<#>(«) 


t 

0(0 

Ji <t>Wdt 

t 

0(0 

fo*<Kt)dt 

2.70 

.01042 

.49653 

3.15 

.00279 

.49918 

3.60 

.00061 

.49984 

2.71 

.01014 

.49664 

3.16 

.00271 

.49921 

3.61 

.00059 

.49985 

2.72 

.00987 

.49674 

3.17 

.00262 

.49924 

3.62 

.00057 

.49985 

2.73 

.00961 

.49683 

3.18 

.00254 

.49926 

3.63 

.00055 

.49986 

2.74 

.00935 

.49693 

3.19 

.00246 

.49929 

3.64 

.00053 

.49986 

2.75 

.00909 

.49702 

3.20 

.00238 

.49931 

3.65 

.00051 

.49987 

2.76 

.00885 

.49711 

3.21 

.00231 

.49934 

3.66 

.00049 

.49987 

2.77 

.00861 

.49720 

3.22 

.00224 

.499:36 

3.67 

.00047 

.49988 

2.78 

.00837 

.49728 

3.23 

.00216 

.49938 

3.68 

.00046 

.49988 

2.79 

.00814 

.49736 

3.24 

.00210 

.49940 

3.69 

.00044 

.49989 

2.80 

.00792 

.49744 

3.25 

.00203 

.49942 

3.70 

.00042 

.49989 

2.81 

.00770 

.49752 

3.26 

.00196 

.49944 

3.71 

.00041 

.49990 

2.82 

.00748 

.49760 

3.27 

.00190 

.49946 

3.72 

.00039 

.49990 

2.83 

.00727 

.49767 

3.28 

.00184 

.49948 

3.73 

.00038 

.49990 

2.84 

.00707 

.49774 

3.29 

.00178 

.49950 

3.74 

.00037 

.49991 

2.85 

.00687 

.49781 

3.:30 

.00172 

.49952 

3.75 

.00035 

.49991 

2.86 

.00668 

.49788 

3.31 

.00167 

.49953 

3.76 

.ooo:m 

.49992 

2.87 

.00649 

.49795 

3.32 

.00161 

.49955 

3.77 

.00033 

.49992 

2.88 

.00631 

.49801 

3.33 

.00156 

.49957 

3.78 

.00031 

.49992 

2.89 

.00613 

.49807 

3.34 

.00151 

.49958 

3.79 

.00030 

.49992 

2.90 

.00595 

.49813 

3 35 

.00146 

.49960 

3.80 

.00029 

.49993 

2.91 

.00578 

.49819 

3.36 

.00141 

.49961 

3.81 

.00028 

.49993 

2.92 

.00562 

.49825 

3.37 

.00136 

.49962 

3.82 

.00027 

.49993 

2.93 

.00545 

.498:31 

3.38 

.00132 

.49964 

3.83 

.00026 

.49994 

2.94 

.00530 

.498:30 

3.39 

.00127 

.49965 

3.84 

.00025 

.49994 

2.95 

.00514 

.49841 

3.40 

.00123 

.49966 

3.85 

.00024 

.49994 

2.96 

.00499 

.49846 

3,41 

.00119 

.49968 

3.86 

.00023 

.49994 

2.97 

.00485 

.49851 

3.42 

.00115 

.49969 

3.87 

.00022 

.49995 

2.98 

.00471 

.49856 

3.43 

.00111 

.49970 

3.88 

.00021 

.49995 

2.99 

.00457 

.49861 

3.44 

.00107 

.49971 

3.89 

.00021 

.49995 

3.00 

.00443 

.49865 

3.45 

.00104 

.49972 

3.90 

.00020 

.49995 

3.01 

.00430 

.49869 

3.46 

.00100 

.49973 

3.91 

.00019 

.49995 

3.02 

.00417 

.49874 

3.47 

.00097 

.49974 

3.92 

.00018 

.49996 

3.03 

.00405 

.49878 

3.48 

.00094 

.49975 

3.93 

.00018 

.49996 

3.04 

.00393 

.49882 

3.49 

.00090 

.49976 

3.94 

.00017 

.49996 

3.05 

.00381 

.49886 

3.50 

.00087 

.49977 

3.95 

.00016 

.49996 

3.06 

.00370 

.49889 

3.51 

.00084 

.49978 

3.96 

.00016 

.49996 

3.07 

.00358 

.49893 

3.52 

.00081 

.49978 

3.97 

.00015 

.49996 

3.08 

.00348 

.49897 

3.53 

.00079 

.49979 

3.98 

.00014 

.49997 

3.09 

.00337 

.49900 

3.54 

.00076 

.49980 

3.99 

.00014 

.49997 

3.10 

.00327 

.49903 

3.55 

.00073 

.49981 




3.11 

.00317 

.49906 

3.56 

.00071 

.49981 




3.12 

.00307 

.49910 

3.57 

.00068 

.49982 




3.13 

.00298 

.49913 

3.58 

.00066 

.49983 




3.14 

.00288 

.49916 

3.59 

.00063 

.49983 
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Table II. Common Logarithms op Numbers to Five Decimal Places 


Prop. Parts 




^ 30.8 30.1 29.4 

*302 8 35.2 34.4 33.6 

703 » 39.6 38.7 37.8 

*100 


41 40 39 

1 4.1 4 3.9 

2 8.2 8 7.8 

4^032 3 12.3 12 11.7 

408 ^ 15.6 

701 5 20.6 20 19.5 

6 24.6 24 23.4 

♦ici 7 28.7 28 27.3 

8 32.8 32 31.2 

9 36.9 36 36.1 


38 37 36 

3.8 3.7 3.6 
7.6 7.4 7.2 
11.4 11.1 10.8 
16.2 14.8 14.4 
19.0 18.5 18.0 


Prop. Parts 


Reprinted by permieaion from ** Plane Trigonometry ” by Simmone and Gore» John Wiley 







































































Table II. Common Logarithms of Numbers to Five Decimal Places 




Prop. Parts 


29 

28 

2.9 

2.8 

6.8 

6.6 

8.7 

8.4 

11.6 

11.2 

14.5 

14.0 

17.4 

16.8 

20.3 

19.6 

23.2 

22.4 

26.1 

26.2 


27 26 

1 2.7 2.6 

2 6.4 6.2 

8 8.1 7.8 

4 10.8 10.4 
6 13.6 13.0 

6 16.2 15.6 

7 18.9 18.2 

8 21.6 20.8 

9 24.3 23.4 


24 23 
2.4 2.3 
4.8 4.6 
7.2 6.9 
9.6 9.2 
12.0 11.6 


54 18 752 

55 19 033 

56 312 




71 300 

72 I 553 

73 


190 27 875 


94 128 780 

95 129 003 

96 


803 825 847 870 892 914 

026 048 070 092 115 137 

248 270 292 314 336 368 

469 491 513 535 657 579 

688 710 732 754 

907 929 951 973 


146 168 


307 
533 
735 768 

959 981 
181 203 
403 425 

623 645 
842 863 
*060 *081 


Prop. Parts 
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Table II. Common Logarithms of Numbers to Five Decimal Placed 









































Table II. Common Logahithms of Numbers to Five Decimal Places 




















































Table II. Common Logarithms op Numbers to Five Decimal Places 







































Table II. Common Logarithms of Numbers to Five Decimal Places 



336 347 358 369 379 

444 466 466 477 487 

662 563 574 584 595 


660 670 681 692 703 713 724 

767 778 788 799 810 821 831 

874 885 895 906 917 927 938 



981 991 *002 *013 
087 098 109 119 
194 204 215 225 


17 62 014 

18 118 

19 221 


395 

405 

416 

426 

600 

511 

521 

632 

606 

616 

627 

637 

711 

721 

731 

742 

815 

826 

836 

847 

920 

930 

941 

951 

024 

034 

045 

065 

128 

138 

149 

159 

232 

242 

252 

263 

s 

346 

mm 

439 

449 

459 

469 

642 

652 

562 

572 

644 

665 

665 

6761 




377 387 397 


488 498 
589 699 
689 699 


769 779 789 799 
869 879 889 899 
969 979 988 998 


068 078 088 098 1 

167 177 187 197 2 

266 276 286 296 3' 


483 493 503 
682 591 601 
680 689 699 



768 777 787 797 807 816 
866 876 886 896 904 914 
963 972 982 992 *002 *011 


Prop. 

Parts 


11 

1 

1.1 

2 

2.2 

8 

3.3 

4 

4.4 

5 

5.5 

6 

6.6 

7 

7.7 

8 

8.8 

• 

9.9 


10 

1 

1.0 

2 

2.0 

8 

3.0 

4 

4.0 

6 

6.0 

6 

6.0 

7 

7.0 

8 

8.0 

8i 

9.0 


9 

1 

0.9 

2 

1.8 

8 

2.7 

4 

3.6 

6 

4.6 

6 

6.4 

7 

6.3 

8 

7.2 

9 

8.1 

Prop. 

Parts 
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Table II. Common Logarithms of Numbers to Five Decimal Places 
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Table II. Common Logarithms op Numbers to Five Decimal Places 



880 
951 
*014 *021 

085 092 
155 162 
225 


31 80 003 010 017 024 030 

32 072 079 085 092 099 

33 140 147 154 161 168 


43 821 828 835 841 848 855 862 

44 889 895 902 909 916 922 929 

45 80 956 963 969 976 983 990 996 

46 81023 030 037 043 050 057 064 

47 090 097 104 111 117 124 131 

48 158 164 171 178 184 191 198 

49 224 231 238 245 251 258 265 


281 

288 

295 

351 

358 

365 

421 

428 

435 

491 

498 

505 

560 

567 

574 

630 

637 

644 

699 

706 

713 

768 

775 

782 

837 

844 

851 

906 

913 

920 

975 

982 

989 

044 

051 

058 

113 

120 

127 

182 

188 

195 

250 

257 

264 

318 

325 

332 

387 

393 

400 

455 

462 

468 

523 

530 

536 

5911 

598 

604 


672 

726 

733 


794 

801 




Prop. Parts 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


Prop. Parts 



304 I 309 I 314 319 
366 371 
418 423 


495 

600 

605 

611 

516 

521 

526 

647 

662 

657 

562 

667 

672 

678 

598 

603 

609 

614 

619 

624 

629 

650 

655 

660 

665 

670 

675 

681 

701 

706 

711 

716 

722 

727 

732 

762 

768 

763 

768 

773 

778 

783 

804 

809 

814 

819 

824 

829 

834 

855 

860 

865 

870 

876 

881 

886 

906 

911 

916 

921 

927 

932 

937 



Prop. Parts 
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INDEX 


Arithmetic mean, 33 
short methods of computing, 39 
of sub-sets, 44, 193 
Array, 193 

Asymmetry, see skewness 
Averages, Chapter III 
discussion of different, 51, 52- 
58 

Average deviation, see mean devi¬ 
ation 

Burr, 1. W., Ill ft. nt. 

Charlier check, 66, 87 
Charts, 24 
ratio, 157 

Classification of data, 9-15 
Class 

boundary, 15 
interval, 11 
limits, 15 
marks, 11 
mid-value of, 11 
Coefficient 
of alienation, 185 
of correlation, Chapter VII 
of variation, 90 
Collateral reading, 5 
Combination of sets, 99 
Compound interest law, 156 
Computing machines, 4, 71 
Constant, 7 
Correlation 
and regression, 178 
coefficient, Chapter VIII 
rank, 222 
ratio, 212 

relation to common causes, 
225 

interpretation of, 225 


intraclass, 232 
surface, 208 
table, 189 

Cumulative frequencies, 16, 27, 
132 

Curve of error, see normal curve 
Curve fitting. Chapter VII, 124 
Curves of growth, 53, 152, 164, 
166 

Deviation, 36 
mean or average, 84 
root-mean-square, 87 ft. nt., 
99 

Dispersion, see measures of, 
relative 90 
Dwyer, P. S., 176 
Estimate, standard error of, 179 
Frequency 
curves, 25, 112 
distributions. Chapter I 
graphical representation of, 
Chapter II 
polygon, 24 

Function, definition, 22 
exponential, 152 
frequency, 112 
linear, 137 
parabolic, 162 
quadratic, 138 
Geometric mean, 52 
Gompertz curve, 164 
Graduation by means of normal 
curve, 128 

Graphical representation. Chap¬ 
ter II, VII 
Harmonic mean, 55 
Histogram, 25 
Hotelling, H., 167 ft. nt. 
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Huntington, E. V., 167 
Kendall, M. G., 51 ft. nt. 
Kurtosis, 73, 109 
Least-squares method, 144 
Logarithmic paper, 161 
Logistic curve, 166 
Makeham’s law, 167 
Mean 

arithmetic, 33 
geometric, 52 
harmonic, 55 
of means, 43 
Mean deviation, 84 
Measures of dispersion. Chap¬ 
ter V 

mean deviation, 84 
quartiles, 82 

semi-interquartile range, 82 
standard deviation, 86 
Median, 47 
Mode, 47 

Moment of a distribution, Chap¬ 
ter IV 

method of, 141 
Normal curve, Chapter VI 
explanation of tables of, 116 
fitted to observed data, 124 
properties of, 118 
standard form of, 115 
Normal equations, 145 
Ogive, 27 

Parabola, fitting a, 162 
Parameter, 115, 124, 141, 153 
Percentiles, 84 
Probability, 131 
Probability paper, 132 
Quartiles, 82 
of normal curve, 119 
Range, 16 
Ratio charts, 157 
Reed-Pearl curve, 166 


Regression 
. coeflScients, 178 
linear, 177 
non-linear, 212 
testing linearity of, 217 
Residuals, 144 
Rietz, H. L., 114 ft. nt. 

Scatter diagram, 171 
Semi-logarithmic paper, 157 
Sheppard's corrections, 78, 88 
Shewhart, W. A., 75 
Skewness, 73, 109 
Snedecor, G. W., 178 
Standard units, 69 
Statistic, 124 
Standard deviation, 68 
of combination of sets, 99 
of grouped data, 86 
of ungroupcd data, 93 
Straight line, 137 
fitting to data, 140 
Symmetry, 73, 109 
Tables 

areas under normal curve, 
Appendix 

logarithms of numbers, Ap¬ 
pendix 

ordinates of normal curve 
Appendix 
Tabulation, 9 
Time series, 150 
Translation of axes, 36 
Trend, 150, 162 
Variability! see dispersion 
Variable, 7 
Variance, 87 
Variates, 7 

Walker, Helen M., 199 
Weighted mean, 33 
Wilkens, J. E., 74 ft. nt. 

Wilson, E. B., 226 




